Skip to primary navigation
Skip to content
Skip to footer

About
Baseline
Tournament
Explore
Datasets
Docs

State-of-the-art model forecasting performance over time

This interactive visualization charts the evolution of AI forecasting accuracy on ForecastBench. Each point represents a model's difficulty-adjusted Brier score across all questions it predicted on (lower is better), plotted by model release date.

Orange points mark models that were state of the art (SOTA) when released; they had the best benchmark performance given their release date.
Vertical bars indicate 95% confidence intervals.
Gray points show non-SOTA models.
The orange dashed line shows the estimated linear trend for SOTA performance improvement, though it should be interpreted with caution as it reflects a simple extrapolation from past data; actual progress may deviate from linearity, thereby changing the projected LLM-superforecaster parity date.

Score

Overall

Dataset

Market

Comparisons

Public

Superforecaster

Always 0.5

Imputed Forecaster

Naive Forecaster

Always 0

Random Uniform

Options

Tournament models

Legend

Projected LLM-superforecaster parity

Hold Shift and drag to zoom into a region. Press Escape to reset zoom.

Related Work

Forecasting Research Institute
Longitudinal Expert AI Panel

Resources

About
Baseline
Tournament
Explore
Datasets
Docs

Follow Us

X
Substack
LinkedIn
GitHub
Hugging Face
Wiki

Contact

forecastbench@forecastingresearch.org

© 2024–2026 Forecasting Research Institute

•

Content licensed under CC BY-SA 4.0