Human Combo Leaderboard: overall

Updated Feb. 17, 2025

Ranking: The position of the model in the leaderboard as ordered by Overall Score
Organization: The group responsible for the model or forecasts
Model: The LLM model & prompt info or the human group and forecast aggregation method
- zero shot: used a zero-shot prompt
- scratchpad: used a scratchpad prompt with instructions that outline a procedure the model should use to reason about the question
- with freeze values: means that, for questions from market sources, the prompt was supplemented with the aggregate human forecast from the relevant platform on the day the question set was generated
- with news: means that the prompt was supplemented with relevant news summaries obtained through an automated process
Dataset Score: The average Brier score across all questions sourced from datasets
Market Score (resolved): The average Brier score across all resolved questions sourced from prediction markets and forecast aggregation platforms
Market Score (unresolved): The average Brier score across all unresolved questions sourced from prediction markets and forecast aggregation platforms
Market Score (overall): The average Brier score across all questions sourced from prediction markets and forecast aggregation platforms
Overall Resolved Score: The average of the Dataset Score and the Market Score (resolved) columns
Overall Score: The average of the Dataset Score and the Market Score (overall) columns
Overall Score 95% CI: The 95% confidence interval for the Overall Score
Pairwise p-value comparing to No. 1 (bootstrapped): The p-value calculated by bootstrapping the differences in overall score between each model and the best forecaster (the group with rank 1) under the null hypothesis that there's no difference.
Pct. more accurate than No. 1: The percent of questions where this forecaster had a better overall score than the best forecaster (with rank 1)
Pct. imputed: The percent of questions for which this forecaster did not provide a forecast and hence had a forecast value imputed (0.5 for dataset questions and the aggregate human forecast on the forecast due date for questions sourced from prediction markets or forecast aggregation platforms)

Ranking	Organization	Model	Dataset Score (N=1,754)	Market Score (resolved) (N=193)	Market Score (unresolved) (N=103)	Market Score (overall) (N=296)	Overall Resolved Score (N=1,947)	Overall Score (N=2,050)	Overall Score 95% CI	Pairwise p-value comparing to No. 1 (bootstrapped)	Pct. more accurate than No. 1	Pct. Imputed
1	ForecastBench	Superforecaster median forecast	0.091	0.062	0.062	0.062	0.076	0.076	[0.067, 0.086]		0%	0%
2	ForecastBench	Public median forecast	0.119	0.092	0.035	0.072	0.105	0.096	[0.086, 0.105]	<0.001	23%	0%
3	OpenAI	GPT-4o (scratchpad with freeze values)	0.175	0.107	0.043	0.085	0.141	0.130	[0.119, 0.141]	<0.001	24%	0%
4	Anthropic	Claude-3-5-Sonnet-20240620 (scratchpad with freeze values)	0.154	0.144	0.038	0.107	0.149	0.131	[0.118, 0.143]	<0.001	24%	0%
5	OpenAI	GPT-4-Turbo-2024-04-09 (scratchpad with freeze values)	0.164	0.126	0.055	0.101	0.145	0.133	[0.121, 0.145]	<0.001	23%	0%
6	OpenAI	GPT-4o (scratchpad with news with freeze values)	0.171	0.132	0.051	0.104	0.151	0.137	[0.125, 0.149]	<0.001	20%	0%
7	Google	Gemini-1.5-Pro (scratchpad with freeze values)	0.152	0.158	0.077	0.130	0.155	0.141	[0.13, 0.152]	<0.001	21%	0%
8	Google	Gemini-1.5-Pro (scratchpad with news with freeze values)	0.154	0.164	0.075	0.133	0.159	0.143	[0.133, 0.154]	<0.001	21%	1%
9	Anthropic	Claude-3-5-Sonnet-20240620 (scratchpad with news with freeze values)	0.160	0.158	0.078	0.130	0.159	0.145	[0.132, 0.158]	<0.001	20%	0%
10	Anthropic	Claude-3-5-Sonnet-20240620 (zero shot with freeze values)	0.174	0.148	0.064	0.119	0.161	0.146	[0.133, 0.16]	<0.001	22%	0%
11	Google	Gemini-1.5-Pro (scratchpad)	0.152	0.172	0.089	0.143	0.162	0.148	[0.137, 0.158]	<0.001	20%	1%
12	OpenAI	GPT-4-Turbo-2024-04-09 (scratchpad)	0.164	0.154	0.091	0.132	0.159	0.148	[0.138, 0.158]	<0.001	17%	0%
13	Google	Gemini-1.5-Pro (scratchpad with news)	0.154	0.165	0.102	0.143	0.159	0.148	[0.137, 0.16]	<0.001	21%	1%
14	Anthropic	Claude-3-5-Sonnet-20240620 (scratchpad)	0.154	0.166	0.099	0.143	0.160	0.149	[0.137, 0.16]	<0.001	20%	0%
15	OpenAI	GPT-4o (scratchpad)	0.175	0.144	0.082	0.122	0.159	0.149	[0.138, 0.159]	<0.001	19%	1%
16	Anthropic	Claude-3-Opus-20240229 (zero shot with freeze values)	0.173	0.162	0.054	0.124	0.167	0.149	[0.135, 0.162]	<0.001	21%	0%
17	OpenAI	GPT-4o (scratchpad with news)	0.171	0.146	0.092	0.127	0.159	0.149	[0.138, 0.16]	<0.001	18%	0%
18	OpenAI	GPT-4-Turbo-2024-04-09 (zero shot with freeze values)	0.200	0.125	0.052	0.100	0.163	0.150	[0.138, 0.162]	<0.001	24%	0%
19	Qwen	Qwen1.5-110B-Chat (scratchpad with freeze values)	0.171	0.160	0.078	0.131	0.165	0.151	[0.14, 0.162]	<0.001	16%	1%
20	Anthropic	Claude-3-5-Sonnet-20240620 (scratchpad with news)	0.160	0.174	0.103	0.149	0.167	0.154	[0.143, 0.166]	<0.001	19%	0%
21	ForecastBench	Imputed Forecaster	0.250	0.073	0.034	0.059	0.161	0.155	[0.147, 0.163]	<0.001	22%	100%
22	OpenAI	GPT-4 (zero shot with freeze values)	0.213	0.125	0.052	0.099	0.169	0.156	[0.144, 0.168]	<0.001	21%	0%
23	Google	Gemini-1.5-Pro (zero shot with freeze values)	0.205	0.125	0.082	0.110	0.165	0.157	[0.144, 0.171]	<0.001	20%	15%
24	Anthropic	Claude-3-5-Sonnet-20240620 (superforecaster with news 3)	0.167	0.181	0.088	0.149	0.174	0.158	[0.146, 0.169]	<0.001	17%	2%
25	OpenAI	GPT-4 (scratchpad with freeze values)	0.190	0.158	0.064	0.125	0.174	0.158	[0.145, 0.171]	<0.001	19%	1%
26	Anthropic	Claude-3-Opus-20240229 (scratchpad with freeze values)	0.185	0.161	0.084	0.134	0.173	0.159	[0.148, 0.171]	<0.001	18%	0%
27	ForecastBench	LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news	0.241	0.097	0.050	0.080	0.169	0.161	[0.153, 0.168]	<0.001	18%	76%
28	OpenAI	GPT-4-Turbo-2024-04-09 (scratchpad with news with freeze values)	0.209	0.132	0.081	0.114	0.170	0.161	[0.149, 0.173]	<0.001	20%	0%
29	ForecastBench	LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news	0.242	0.100	0.050	0.083	0.171	0.162	[0.155, 0.17]	<0.001	18%	76%
30	ForecastBench	LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news	0.243	0.099	0.050	0.082	0.171	0.162	[0.155, 0.17]	<0.001	18%	76%
31	Google	Gemini-1.5-Pro (superforecaster with news 3)	0.176	0.176	0.106	0.151	0.176	0.164	[0.153, 0.175]	<0.001	19%	1%
32	Mistral AI	Mistral-Large-Latest (scratchpad with freeze values)	0.185	0.166	0.100	0.143	0.176	0.164	[0.154, 0.175]	<0.001	16%	0%
33	OpenAI	GPT-4 (scratchpad)	0.190	0.169	0.086	0.140	0.180	0.165	[0.156, 0.174]	<0.001	15%	1%
34	Qwen	Qwen1.5-110B-Chat (scratchpad)	0.171	0.187	0.111	0.161	0.179	0.166	[0.156, 0.175]	<0.001	15%	0%
35	Google	Gemini-1.5-Pro (zero shot)	0.205	0.136	0.113	0.128	0.171	0.167	[0.154, 0.179]	<0.001	19%	15%
36	Google	Gemini-1.5-Flash (scratchpad with freeze values)	0.179	0.182	0.103	0.154	0.180	0.167	[0.153, 0.18]	<0.001	18%	0%
37	Anthropic	Claude-2.1 (scratchpad)	0.228	0.117	0.084	0.105	0.172	0.167	[0.157, 0.177]	<0.001	20%	24%
38	Meta	Llama-3-70b-Chat-Hf (zero shot with freeze values)	0.205	0.168	0.065	0.132	0.186	0.168	[0.155, 0.182]	<0.001	19%	0%
39	Meta	Llama-3-70b-Chat-Hf (scratchpad with freeze values)	0.208	0.156	0.078	0.129	0.182	0.169	[0.158, 0.179]	<0.001	17%	0%
40	Anthropic	Claude-3-Opus-20240229 (zero shot)	0.173	0.201	0.098	0.165	0.187	0.169	[0.156, 0.183]	<0.001	18%	0%
41	Google	Gemini-1.5-Flash (zero shot with freeze values)	0.217	0.150	0.069	0.122	0.183	0.169	[0.155, 0.183]	<0.001	23%	0%
42	OpenAI	GPT-4-Turbo-2024-04-09 (zero shot)	0.200	0.160	0.100	0.139	0.180	0.169	[0.157, 0.182]	<0.001	19%	1%
43	OpenAI	GPT-4-Turbo-2024-04-09 (scratchpad with news)	0.209	0.149	0.096	0.131	0.179	0.170	[0.159, 0.18]	<0.001	17%	0%
44	Google	Gemini-1.5-Flash (scratchpad)	0.179	0.185	0.115	0.161	0.182	0.170	[0.159, 0.181]	<0.001	16%	0%
45	OpenAI	GPT-4-Turbo-2024-04-09 (superforecaster with news 3)	0.202	0.163	0.093	0.139	0.183	0.170	[0.16, 0.181]	<0.001	17%	9%
46	Qwen	Qwen1.5-110B-Chat (scratchpad with news with freeze values)	0.198	0.172	0.097	0.146	0.185	0.172	[0.161, 0.183]	<0.001	16%	0%
47	Anthropic	Claude-3-5-Sonnet-20240620 (zero shot)	0.174	0.197	0.122	0.171	0.185	0.172	[0.158, 0.187]	<0.001	17%	1%
48	Mistral AI	Mistral-Large-Latest (zero shot with freeze values)	0.203	0.184	0.071	0.145	0.194	0.174	[0.16, 0.188]	<0.001	19%	0%
49	Anthropic	Claude-2.1 (scratchpad with freeze values)	0.228	0.147	0.070	0.120	0.187	0.174	[0.162, 0.186]	<0.001	20%	18%
50	OpenAI	GPT-4o (superforecaster with news 3)	0.206	0.169	0.100	0.145	0.188	0.175	[0.165, 0.186]	<0.001	16%	7%
51	Anthropic	Claude-3-Opus-20240229 (scratchpad)	0.185	0.206	0.106	0.171	0.195	0.178	[0.167, 0.189]	<0.001	17%	0%
52	Mistral AI	Mistral-Large-Latest (scratchpad)	0.185	0.197	0.122	0.171	0.191	0.178	[0.168, 0.188]	<0.001	15%	0%
53	Anthropic	Claude-3-5-Sonnet-20240620 (scratchpad with SECOND news)	0.217	0.165	0.094	0.140	0.191	0.178	[0.167, 0.19]	<0.001	15%	1%
54	OpenAI	GPT-4o (scratchpad with SECOND news)	0.232	0.139	0.101	0.126	0.185	0.179	[0.168, 0.19]	<0.001	15%	11%
55	Qwen	Qwen1.5-110B-Chat (superforecaster with news 1)	0.203	0.184	0.101	0.155	0.194	0.179	[0.168, 0.191]	<0.001	17%	15%
56	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (scratchpad)	0.198	0.195	0.098	0.161	0.197	0.180	[0.169, 0.19]	<0.001	16%	0%
57	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (scratchpad with freeze values)	0.198	0.189	0.113	0.162	0.193	0.180	[0.167, 0.193]	<0.001	16%	0%
58	Anthropic	Claude-3-5-Sonnet-20240620 (superforecaster with news 1)	0.203	0.190	0.100	0.159	0.196	0.181	[0.168, 0.193]	<0.001	18%	17%
59	OpenAI	GPT-4o (superforecaster with news 1)	0.200	0.194	0.102	0.162	0.197	0.181	[0.167, 0.195]	<0.001	19%	0%
60	Qwen	Qwen1.5-110B-Chat (scratchpad with news)	0.198	0.191	0.113	0.164	0.194	0.181	[0.171, 0.191]	<0.001	16%	0%
61	OpenAI	GPT-4o (zero shot with freeze values)	0.210	0.175	0.111	0.153	0.192	0.181	[0.165, 0.197]	<0.001	21%	1%
62	Google	Gemini-1.5-Flash (scratchpad with news with freeze values)	0.213	0.174	0.105	0.150	0.194	0.182	[0.169, 0.195]	<0.001	16%	0%
63	Meta	Llama-3-70b-Chat-Hf (zero shot)	0.205	0.188	0.113	0.162	0.196	0.183	[0.172, 0.194]	<0.001	16%	0%
64	OpenAI	GPT-4o (zero shot)	0.210	0.196	0.085	0.157	0.203	0.183	[0.17, 0.197]	<0.001	18%	3%
65	Anthropic	Claude-3-Opus-20240229 (scratchpad with news with freeze values)	0.203	0.198	0.106	0.166	0.200	0.184	[0.172, 0.196]	<0.001	16%	0%
66	Anthropic	Claude-3-Opus-20240229 (superforecaster with news 1)	0.197	0.211	0.103	0.173	0.204	0.185	[0.173, 0.197]	<0.001	16%	8%
67	Anthropic	Claude-3-5-Sonnet-20240620 (superforecaster with news 2)	0.200	0.202	0.111	0.171	0.201	0.185	[0.173, 0.198]	<0.001	17%	1%
68	Google	Gemini-1.5-Flash (superforecaster with news 2)	0.222	0.161	0.128	0.150	0.191	0.186	[0.174, 0.197]	<0.001	18%	11%
69	Anthropic	Claude-3-Opus-20240229 (superforecaster with news 3)	0.195	0.206	0.119	0.176	0.201	0.186	[0.174, 0.198]	<0.001	16%	5%
70	Anthropic	Claude-3-Opus-20240229 (scratchpad with news)	0.203	0.201	0.115	0.171	0.202	0.187	[0.176, 0.198]	<0.001	16%	0%
71	Qwen	Qwen1.5-110B-Chat (superforecaster with news 3)	0.213	0.191	0.104	0.161	0.202	0.187	[0.178, 0.197]	<0.001	16%	4%
72	Google	Gemini-1.5-Flash (scratchpad with news)	0.213	0.187	0.117	0.162	0.200	0.188	[0.177, 0.199]	<0.001	16%	0%
73	Qwen	Qwen1.5-110B-Chat (zero shot with freeze values)	0.218	0.196	0.090	0.159	0.207	0.189	[0.174, 0.203]	<0.001	16%	1%
74	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (zero shot with freeze values)	0.207	0.197	0.123	0.171	0.202	0.189	[0.174, 0.204]	<0.001	19%	0%
75	Meta	Llama-3-8b-Chat-Hf (zero shot with freeze values)	0.224	0.159	0.148	0.155	0.191	0.189	[0.175, 0.204]	<0.001	18%	0%
76	Anthropic	Claude-2.1 (scratchpad with news)	0.234	0.179	0.092	0.148	0.206	0.191	[0.179, 0.203]	<0.001	18%	14%
77	Meta	Llama-3-70b-Chat-Hf (scratchpad)	0.208	0.206	0.116	0.175	0.207	0.192	[0.183, 0.2]	<0.001	16%	0%
78	Anthropic	Claude-2.1 (scratchpad with news with freeze values)	0.234	0.182	0.090	0.150	0.208	0.192	[0.18, 0.204]	<0.001	18%	10%
79	Google	Gemini-1.5-Pro (superforecaster with news 1)	0.209	0.217	0.104	0.177	0.213	0.193	[0.18, 0.206]	<0.001	18%	4%
80	OpenAI	GPT-4-Turbo-2024-04-09 (superforecaster with news 1)	0.216	0.205	0.107	0.171	0.210	0.193	[0.18, 0.206]	<0.001	16%	0%
81	Qwen	Qwen1.5-110B-Chat (zero shot)	0.218	0.203	0.104	0.169	0.211	0.193	[0.182, 0.204]	<0.001	15%	1%
82	OpenAI	GPT-4 (zero shot)	0.213	0.212	0.111	0.177	0.213	0.195	[0.184, 0.206]	<0.001	16%	0%
83	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 3)	0.228	0.197	0.099	0.163	0.213	0.196	[0.186, 0.205]	<0.001	14%	13%
84	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (scratchpad with news)	0.229	0.195	0.103	0.163	0.212	0.196	[0.187, 0.205]	<0.001	15%	0%
85	Mistral AI	Mistral-Large-Latest (zero shot)	0.203	0.216	0.142	0.190	0.209	0.196	[0.184, 0.209]	<0.001	16%	0%
86	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (scratchpad)	0.236	0.179	0.117	0.158	0.208	0.197	[0.185, 0.208]	<0.001	19%	12%
87	Anthropic	Claude-3-Opus-20240229 (superforecaster with news 2)	0.207	0.218	0.127	0.187	0.213	0.197	[0.184, 0.21]	<0.001	16%	1%
88	Google	Gemini-1.5-Flash (superforecaster with news 3)	0.225	0.197	0.121	0.170	0.211	0.198	[0.187, 0.209]	<0.001	15%	9%
89	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 1)	0.234	0.183	0.123	0.163	0.209	0.198	[0.185, 0.211]	<0.001	17%	19%
90	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (scratchpad with news with freeze values)	0.229	0.210	0.099	0.171	0.219	0.200	[0.189, 0.211]	<0.001	16%	0%
91	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (zero shot)	0.207	0.222	0.140	0.193	0.214	0.200	[0.186, 0.214]	<0.001	16%	0%
92	Meta	Llama-3-8b-Chat-Hf (zero shot)	0.224	0.194	0.164	0.184	0.209	0.204	[0.188, 0.219]	<0.001	18%	0%
93	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (zero shot with freeze values)	0.260	0.181	0.099	0.153	0.220	0.206	[0.19, 0.222]	<0.001	23%	0%
94	Google	Gemini-1.5-Flash (zero shot)	0.217	0.212	0.166	0.196	0.214	0.206	[0.192, 0.221]	<0.001	18%	1%
95	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (scratchpad with freeze values)	0.236	0.203	0.130	0.178	0.220	0.207	[0.193, 0.221]	<0.001	18%	11%
96	Google	Gemini-1.5-Pro (superforecaster with news 2)	0.242	0.210	0.115	0.177	0.226	0.210	[0.195, 0.225]	<0.001	18%	0%
97	Anthropic	Claude-2.1 (superforecaster with news 2)	0.265	0.189	0.099	0.157	0.227	0.211	[0.197, 0.225]	<0.001	20%	21%
98	Mistral AI	Mistral-Large-Latest (scratchpad with news with freeze values)	0.242	0.209	0.130	0.181	0.226	0.212	[0.2, 0.223]	<0.001	15%	0%
99	OpenAI	GPT-4-Turbo-2024-04-09 (superforecaster with news 2)	0.246	0.210	0.119	0.178	0.228	0.212	[0.198, 0.226]	<0.001	18%	8%
100	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 3)	0.270	0.168	0.129	0.154	0.219	0.212	[0.201, 0.223]	<0.001	18%	17%
101	Mistral AI	Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 2)	0.246	0.216	0.111	0.179	0.231	0.213	[0.203, 0.223]	<0.001	16%	1%
102	Qwen	Qwen1.5-110B-Chat (superforecaster with news 2)	0.243	0.224	0.115	0.186	0.233	0.214	[0.203, 0.225]	<0.001	16%	4%
103	Mistral AI	Mistral-Large-Latest (scratchpad with news)	0.242	0.218	0.127	0.186	0.230	0.214	[0.203, 0.225]	<0.001	15%	0%
104	Anthropic	Claude-2.1 (zero shot with freeze values)	0.244	0.214	0.132	0.186	0.229	0.215	[0.198, 0.231]	<0.001	18%	0%
105	OpenAI	GPT-4o (superforecaster with news 2)	0.257	0.210	0.112	0.176	0.234	0.216	[0.203, 0.23]	<0.001	18%	5%
106	Anthropic	Claude-2.1 (superforecaster with news 3)	0.257	0.218	0.104	0.178	0.237	0.217	[0.205, 0.23]	<0.001	18%	10%
107	Anthropic	Claude-2.1 (superforecaster with news 1)	0.274	0.199	0.103	0.165	0.237	0.220	[0.206, 0.233]	<0.001	19%	24%
108	Google	Gemini-1.5-Flash (superforecaster with news 1)	0.237	0.234	0.144	0.203	0.236	0.220	[0.206, 0.234]	<0.001	15%	20%
109	Mistral AI	Mistral-Large-Latest (superforecaster with news 2)	0.231	0.244	0.159	0.215	0.237	0.223	[0.209, 0.236]	<0.001	15%	9%
110	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 1)	0.295	0.175	0.123	0.157	0.235	0.226	[0.212, 0.239]	<0.001	18%	17%
111	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (scratchpad with news with freeze values)	0.291	0.170	0.144	0.161	0.231	0.226	[0.214, 0.238]	<0.001	18%	14%
112	Mistral AI	Mistral-Large-Latest (superforecaster with news 1)	0.248	0.232	0.159	0.207	0.240	0.227	[0.213, 0.242]	<0.001	15%	25%
113	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (scratchpad with news)	0.291	0.186	0.123	0.164	0.238	0.228	[0.215, 0.24]	<0.001	17%	15%
114	Anthropic	Claude-2.1 (zero shot)	0.244	0.240	0.158	0.211	0.242	0.228	[0.215, 0.24]	<0.001	15%	0%
115	Mistral AI	Mistral-Large-Latest (superforecaster with news 3)	0.267	0.217	0.137	0.189	0.242	0.228	[0.216, 0.239]	<0.001	15%	5%
116	ForecastBench	Always 0.5	0.250	0.250	0.133	0.209	0.250	0.230	[0.225, 0.234]	<0.001	14%	0%
117	Meta	Llama-2-70b-Chat-Hf (scratchpad with freeze values)	0.262	0.234	0.134	0.199	0.248	0.230	[0.22, 0.241]	<0.001	16%	1%
118	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 2)	0.289	0.197	0.129	0.173	0.243	0.231	[0.218, 0.244]	<0.001	20%	20%
119	Meta	Llama-3-8b-Chat-Hf (scratchpad with freeze values)	0.272	0.222	0.137	0.192	0.247	0.232	[0.22, 0.244]	<0.001	15%	0%
120	Mistral AI	Mixtral-8x7B-Instruct-V0.1 (zero shot)	0.260	0.234	0.159	0.208	0.247	0.234	[0.216, 0.251]	<0.001	19%	0%
121	Anthropic	Claude-3-Haiku-20240307 (zero shot with freeze values)	0.280	0.226	0.123	0.190	0.253	0.235	[0.222, 0.247]	<0.001	15%	1%
122	Meta	Llama-2-70b-Chat-Hf (scratchpad)	0.262	0.249	0.138	0.211	0.255	0.236	[0.226, 0.246]	<0.001	16%	1%
123	Anthropic	Claude-3-Haiku-20240307 (scratchpad with freeze values)	0.270	0.248	0.140	0.210	0.259	0.240	[0.229, 0.252]	<0.001	14%	1%
124	Meta	Llama-3-8b-Chat-Hf (scratchpad)	0.272	0.246	0.149	0.212	0.259	0.242	[0.231, 0.252]	<0.001	15%	0%
125	Anthropic	Claude-3-Haiku-20240307 (scratchpad)	0.270	0.262	0.137	0.218	0.266	0.244	[0.235, 0.254]	<0.001	15%	1%
126	Anthropic	Claude-3-Haiku-20240307 (superforecaster with news 2)	0.263	0.271	0.140	0.226	0.267	0.244	[0.233, 0.256]	<0.001	15%	5%
127	ForecastBench	Always 0	0.269	0.218	0.231	0.222	0.243	0.246	[0.221, 0.27]	<0.001	46%	0%
128	Anthropic	Claude-3-Haiku-20240307 (zero shot)	0.280	0.253	0.137	0.212	0.266	0.246	[0.236, 0.256]	<0.001	15%	1%
129	Anthropic	Claude-3-Haiku-20240307 (superforecaster with news 3)	0.287	0.258	0.156	0.223	0.272	0.255	[0.245, 0.264]	<0.001	14%	18%
130	Anthropic	Claude-3-Haiku-20240307 (scratchpad with news with freeze values)	0.305	0.252	0.122	0.207	0.278	0.256	[0.246, 0.266]	<0.001	14%	0%
131	OpenAI	GPT-3.5-Turbo-0125 (scratchpad with freeze values)	0.288	0.269	0.166	0.233	0.278	0.261	[0.25, 0.271]	<0.001	14%	0%
132	Anthropic	Claude-3-Haiku-20240307 (scratchpad with news)	0.305	0.262	0.135	0.218	0.283	0.261	[0.251, 0.271]	<0.001	15%	1%
133	OpenAI	GPT-3.5-Turbo-0125 (scratchpad)	0.288	0.291	0.162	0.246	0.289	0.267	[0.256, 0.278]	<0.001	15%	0%
134	Meta	Llama-2-70b-Chat-Hf (zero shot with freeze values)	0.304	0.288	0.189	0.254	0.296	0.279	[0.262, 0.296]	<0.001	17%	0%
135	Anthropic	Claude-3-Haiku-20240307 (superforecaster with news 1)	0.290	0.321	0.168	0.268	0.306	0.279	[0.265, 0.292]	<0.001	15%	6%
136	Meta	Llama-2-70b-Chat-Hf (zero shot)	0.304	0.337	0.186	0.284	0.320	0.294	[0.281, 0.307]	<0.001	16%	1%
137	ForecastBench	Random Uniform	0.340	0.311	0.191	0.269	0.325	0.304	[0.287, 0.322]	<0.001	18%	0%
138	OpenAI	GPT-3.5-Turbo-0125 (zero shot with freeze values)	0.446	0.257	0.172	0.227	0.351	0.337	[0.318, 0.355]	<0.001	20%	0%
139	OpenAI	GPT-3.5-Turbo-0125 (zero shot)	0.446	0.320	0.200	0.279	0.383	0.362	[0.346, 0.378]	<0.001	17%	0%
140	ForecastBench	Always 1	0.731	0.782	0.536	0.696	0.757	0.714	[0.688, 0.74]	<0.001	20%	0%