Reader's guide

If you're new to ForecastBench, we recommend you start by reading the latest version of the paper to understand how it works. To see the changes we've made since the paper was last updated, refer to the Changelog. If you'd like to take a deep dive into our scoring methodology and stability tests, see the addendum to the paper. Finally, you can view the codebase and learn more about the ForecastBench architecture on our GitHub repository.

The ICLR 2025 version of the paper is provided as a reference.

Latest version of paper

To better understand how ForecastBench works, see the latest version of our paper on arXiv. It has been updated since the ICLR 2025 version. Last updated: 28 Feb 2025.

arXiv Paper

Changelog

There have been several notable changes to the benchmark since the latest version of the paper. You can find a brief overview on the changelog. Last updated: 6 Oct 2025.

Changelog

Addendum to the paper

Our technical report presents the difficulty-adjusted Brier score, the ranking methodology that enables fair comparisons when forecasters predict on different questions. The report details simulation results showing this scoring rule outperforms alternatives and includes our stability analyses demonstrating that rankings become stable within 50 days of a new model participating on ForecastBench. Last updated: 3 Oct 2025.

Technical Report (PDF)

Codebase

The ForecastBench codebase is open-sourced under an MIT license and available on GitHub. The repository contains the full pipeline for generating forecasting questions from time-series data, evaluating LLM and human forecasts, computing difficulty-adjusted Brier scores, and maintaining the leaderboard.

GitHub Repository

Architecture

ForecastBench is an automated Benchmark. Every night at 00:00 UTC the system ingests new questions and resolution data, runs validation and category tagging, refreshes metadata, resolves forecasts, and updates the leaderboard. Every two weeks it samples balanced question sets for LLMs (1,000) and humans (200) and starts a new forecasting round. To better understand how ForecastBench updates its question bank, creates question sets, and resolved forecasts, see the dedicated page on the wiki.

GitHub Wiki

ICLR 2025

The benchmark was accepted to ICLR 2025 as a poster. On that page, you can view the poster , slides , and ICLR version of the paper . Last updated: 22 Jan 2025.

ICLR Page
@inproceedings{karger2025forecastbench,
  title={ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities},
  author={Ezra Karger and Houtan Bastani and Chen Yueh‑Han and Zachary Jacobs and Danny Halawi and Fred Zhang and Philip E. Tetlock},
  year={2025},
  booktitle={International Conference on Learning Representations (ICLR)},
  url={https://iclr.cc/virtual/2025/poster/28507}
}

Leaderboards

The leaderboards from the ICLR 2025 paper are available both here and in the forecastbench-datasets repo (601f6d9).