State-of-the-art model forecasting performance over time

This interactive visualization charts the evolution of AI forecasting accuracy on ForecastBench. Each point represents a model's difficulty-adjusted Brier score across all questions it predicted on (lower is better), plotted by model release date.

  • Orange points mark models that were state of the art (SOTA) when released; they had the best benchmark performance given their release date.
  • Vertical bars indicate 95% confidence intervals.
  • Gray points show non-SOTA models.
  • The orange dashed line shows the estimated linear trend for SOTA performance improvement, though it should be interpreted with caution as it reflects a simple extrapolation from past data; actual progress may deviate from linearity, thereby changing the projected LLM-superforecaster parity date.

Score
Comparisons
Options
Hold Shift and drag to zoom into a region. Press Escape to reset zoom.