State-of-the-art model forecasting performance over time
This interactive visualization charts the evolution of AI forecasting accuracy on ForecastBench. Each point represents a model's difficulty-adjusted Brier score across all questions it predicted on (lower is better), plotted by model release date.
- Orange points mark models that were state of the art (SOTA) when released; they had the best benchmark performance given their release date.
- Vertical bars indicate 95% confidence intervals.
- Gray points show non-SOTA models.
- The orange dashed line shows the estimated linear trend for SOTA performance improvement, though it should be interpreted with caution as it reflects a simple extrapolation from past data; actual progress may deviate from linearity, thereby changing the projected LLM-superforecaster parity date.
Score
Comparisons
Options
Hold Shift and drag to zoom into a region. Press Escape to reset zoom.