Updated Dec. 22, 2024
Ranking | Organization | Model | Dataset Score (N=4,122) | Market Score (resolved) (N=270) | Market Score (unresolved) (N=637) | Market Score (overall) (N=907) | Overall Resolved Score (N=4,392) | Overall Score (N=5,029) | Overall Score 95% CI | Pairwise p-value comparing to No. 1 (bootstrapped) | Pct. more accurate than No. 1 | Pct. Imputed |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with freeze values) | 0.172 | 0.121 | 0.048 | 0.069 | 0.147 | 0.121 | [0.115, 0.127] | 0% | 0% | |
2 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad with freeze values) | 0.173 | 0.116 | 0.053 | 0.072 | 0.144 | 0.122 | [0.116, 0.128] | 0.265 | 43% | 0% |
3 | OpenAI | GPT-4o (scratchpad with freeze values) | 0.192 | 0.096 | 0.047 | 0.062 | 0.144 | 0.127 | [0.121, 0.132] | <0.01 | 43% | 0% |
4 | Gemini-1.5-Pro (scratchpad with freeze values) | 0.163 | 0.146 | 0.076 | 0.097 | 0.155 | 0.130 | [0.124, 0.136] | <0.001 | 35% | 0% | |
5 | OpenAI | GPT-4o (scratchpad with news with freeze values) | 0.193 | 0.119 | 0.055 | 0.074 | 0.156 | 0.134 | [0.128, 0.14] | <0.001 | 39% | 0% |
6 | Gemini-1.5-Pro (scratchpad with news with freeze values) | 0.167 | 0.150 | 0.081 | 0.101 | 0.159 | 0.134 | [0.129, 0.14] | <0.001 | 35% | 1% | |
7 | Anthropic | Claude-3-Opus-20240229 (zero shot with freeze values) | 0.189 | 0.147 | 0.056 | 0.083 | 0.168 | 0.136 | [0.129, 0.142] | <0.001 | 41% | 0% |
8 | Qwen | Qwen1.5-110B-Chat (scratchpad with freeze values) | 0.177 | 0.157 | 0.075 | 0.099 | 0.167 | 0.138 | [0.132, 0.144] | <0.001 | 31% | 1% |
9 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad) | 0.173 | 0.152 | 0.085 | 0.105 | 0.163 | 0.139 | [0.134, 0.144] | <0.001 | 32% | 0% |
10 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with news with freeze values) | 0.189 | 0.144 | 0.068 | 0.091 | 0.167 | 0.140 | [0.134, 0.146] | <0.001 | 32% | 0% |
11 | OpenAI | GPT-4-Turbo-2024-04-09 (zero shot with freeze values) | 0.205 | 0.133 | 0.051 | 0.076 | 0.169 | 0.141 | [0.134, 0.147] | <0.001 | 41% | 0% |
12 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad) | 0.172 | 0.158 | 0.089 | 0.109 | 0.165 | 0.141 | [0.135, 0.147] | <0.001 | 10% | 0% |
13 | Gemini-1.5-Pro (scratchpad) | 0.163 | 0.172 | 0.096 | 0.119 | 0.168 | 0.141 | [0.135, 0.147] | <0.001 | 32% | 0% | |
14 | Anthropic | Claude-3-5-Sonnet-20240620 (zero shot with freeze values) | 0.198 | 0.142 | 0.060 | 0.085 | 0.170 | 0.141 | [0.134, 0.148] | <0.001 | 41% | 0% |
15 | Gemini-1.5-Pro (scratchpad with news) | 0.167 | 0.161 | 0.099 | 0.118 | 0.164 | 0.143 | [0.137, 0.149] | <0.001 | 32% | 1% | |
16 | OpenAI | GPT-4 (scratchpad with freeze values) | 0.195 | 0.133 | 0.072 | 0.090 | 0.164 | 0.143 | [0.137, 0.149] | <0.001 | 36% | 0% |
17 | ForecastBench | Imputed Forecaster | 0.250 | 0.058 | 0.033 | 0.040 | 0.154 | 0.145 | [0.142, 0.149] | <0.001 | 47% | 100% |
18 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad with news with freeze values) | 0.211 | 0.115 | 0.067 | 0.081 | 0.163 | 0.146 | [0.14, 0.152] | <0.001 | 35% | 0% |
19 | Gemini-1.5-Pro (zero shot with freeze values) | 0.220 | 0.107 | 0.060 | 0.074 | 0.164 | 0.147 | [0.141, 0.153] | <0.001 | 39% | 17% | |
20 | OpenAI | GPT-4o (scratchpad) | 0.192 | 0.161 | 0.078 | 0.103 | 0.177 | 0.147 | [0.142, 0.153] | <0.001 | 31% | 0% |
21 | OpenAI | GPT-4o (scratchpad with news) | 0.193 | 0.136 | 0.088 | 0.103 | 0.165 | 0.148 | [0.142, 0.154] | <0.001 | 31% | 0% |
22 | ForecastBench | LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news | 0.240 | 0.084 | 0.049 | 0.060 | 0.162 | 0.150 | [0.146, 0.153] | <0.001 | 38% | 76% |
23 | OpenAI | GPT-4 (zero shot with freeze values) | 0.222 | 0.139 | 0.053 | 0.079 | 0.180 | 0.150 | [0.144, 0.156] | <0.001 | 38% | 0% |
24 | Qwen | Qwen1.5-110B-Chat (scratchpad) | 0.177 | 0.178 | 0.101 | 0.124 | 0.177 | 0.150 | [0.145, 0.155] | <0.001 | 29% | 0% |
25 | Meta | Llama-3-70b-Chat-Hf (zero shot with freeze values) | 0.209 | 0.164 | 0.061 | 0.092 | 0.186 | 0.150 | [0.144, 0.157] | <0.001 | 33% | 0% |
26 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with news) | 0.189 | 0.157 | 0.094 | 0.112 | 0.173 | 0.151 | [0.145, 0.156] | <0.001 | 30% | 0% |
27 | ForecastBench | LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news | 0.241 | 0.089 | 0.051 | 0.063 | 0.165 | 0.152 | [0.148, 0.155] | <0.001 | 38% | 76% |
28 | ForecastBench | LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news | 0.242 | 0.088 | 0.052 | 0.062 | 0.165 | 0.152 | [0.148, 0.156] | <0.001 | 37% | 76% |
29 | Mistral AI | Mistral-Large-Latest (scratchpad with freeze values) | 0.200 | 0.158 | 0.082 | 0.105 | 0.179 | 0.152 | [0.147, 0.158] | <0.001 | 26% | 0% |
30 | OpenAI | GPT-4 (scratchpad) | 0.195 | 0.163 | 0.090 | 0.112 | 0.179 | 0.153 | [0.149, 0.158] | <0.001 | 29% | 0% |
31 | Anthropic | Claude-3-Opus-20240229 (scratchpad with freeze values) | 0.205 | 0.162 | 0.076 | 0.102 | 0.184 | 0.154 | [0.148, 0.16] | <0.001 | 28% | 0% |
32 | Gemini-1.5-Pro (superforecaster with news 3) | 0.188 | 0.162 | 0.103 | 0.121 | 0.175 | 0.154 | [0.148, 0.16] | <0.001 | 32% | 0% | |
33 | Gemini-1.5-Flash (scratchpad with freeze values) | 0.193 | 0.165 | 0.096 | 0.117 | 0.179 | 0.155 | [0.148, 0.162] | <0.001 | 32% | 0% | |
34 | Gemini-1.5-Pro (zero shot) | 0.220 | 0.122 | 0.076 | 0.090 | 0.171 | 0.155 | [0.149, 0.161] | <0.001 | 36% | 17% | |
35 | Anthropic | Claude-3-5-Sonnet-20240620 (superforecaster with news 3) | 0.193 | 0.171 | 0.095 | 0.118 | 0.182 | 0.155 | [0.149, 0.161] | <0.001 | 29% | 2% |
36 | OpenAI | GPT-4-Turbo-2024-04-09 (zero shot) | 0.205 | 0.159 | 0.084 | 0.107 | 0.182 | 0.156 | [0.149, 0.162] | <0.001 | 32% | 0% |
37 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad with news) | 0.211 | 0.143 | 0.087 | 0.104 | 0.177 | 0.157 | [0.152, 0.163] | <0.001 | 28% | 0% |
38 | OpenAI | GPT-4-Turbo-2024-04-09 (superforecaster with news 3) | 0.209 | 0.154 | 0.086 | 0.106 | 0.181 | 0.157 | [0.152, 0.163] | <0.001 | 28% | 8% |
39 | Anthropic | Claude-3-Opus-20240229 (zero shot) | 0.189 | 0.191 | 0.100 | 0.127 | 0.190 | 0.158 | [0.151, 0.165] | <0.001 | 34% | 0% |
40 | Meta | Llama-3-70b-Chat-Hf (scratchpad with freeze values) | 0.217 | 0.150 | 0.077 | 0.098 | 0.184 | 0.158 | [0.152, 0.163] | <0.001 | 26% | 0% |
41 | Qwen | Qwen1.5-110B-Chat (scratchpad with news with freeze values) | 0.208 | 0.156 | 0.092 | 0.111 | 0.182 | 0.160 | [0.154, 0.165] | <0.001 | 26% | 0% |
42 | Mistral AI | Mistral-Large-Latest (zero shot with freeze values) | 0.209 | 0.193 | 0.078 | 0.112 | 0.201 | 0.160 | [0.153, 0.168] | <0.001 | 31% | 0% |
43 | Anthropic | Claude-3-5-Sonnet-20240620 (zero shot) | 0.198 | 0.198 | 0.092 | 0.124 | 0.198 | 0.161 | [0.154, 0.168] | <0.001 | 34% | 1% |
44 | Gemini-1.5-Flash (zero shot with freeze values) | 0.232 | 0.153 | 0.063 | 0.090 | 0.193 | 0.161 | [0.154, 0.168] | <0.001 | 40% | 0% | |
45 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad with freeze values) | 0.211 | 0.166 | 0.089 | 0.112 | 0.189 | 0.162 | [0.155, 0.168] | <0.001 | 30% | 0% |
46 | OpenAI | GPT-4o (superforecaster with news 3) | 0.212 | 0.142 | 0.100 | 0.112 | 0.177 | 0.162 | [0.157, 0.167] | <0.001 | 28% | 5% |
47 | Anthropic | Claude-2.1 (scratchpad) | 0.236 | 0.122 | 0.079 | 0.092 | 0.179 | 0.164 | [0.158, 0.169] | <0.001 | 38% | 22% |
48 | Gemini-1.5-Flash (scratchpad) | 0.193 | 0.175 | 0.117 | 0.134 | 0.184 | 0.164 | [0.157, 0.17] | <0.001 | 28% | 0% | |
49 | Meta | Llama-3-70b-Chat-Hf (zero shot) | 0.209 | 0.178 | 0.094 | 0.119 | 0.193 | 0.164 | [0.158, 0.17] | <0.001 | 28% | 0% |
50 | OpenAI | GPT-4o (superforecaster with news 1) | 0.215 | 0.178 | 0.093 | 0.118 | 0.196 | 0.167 | [0.16, 0.174] | <0.001 | 31% | 0% |
51 | Anthropic | Claude-3-5-Sonnet-20240620 (superforecaster with news 1) | 0.212 | 0.168 | 0.103 | 0.122 | 0.190 | 0.167 | [0.161, 0.173] | <0.001 | 30% | 16% |
52 | OpenAI | GPT-4o (zero shot with freeze values) | 0.227 | 0.150 | 0.089 | 0.107 | 0.188 | 0.167 | [0.16, 0.175] | <0.001 | 36% | 1% |
53 | Anthropic | Claude-3-Opus-20240229 (scratchpad) | 0.205 | 0.191 | 0.105 | 0.130 | 0.198 | 0.168 | [0.162, 0.174] | <0.001 | 27% | 0% |
54 | Qwen | Qwen1.5-110B-Chat (zero shot with freeze values) | 0.228 | 0.181 | 0.078 | 0.109 | 0.204 | 0.168 | [0.161, 0.175] | <0.001 | 31% | 1% |
55 | OpenAI | GPT-4-Turbo-2024-04-09 (superforecaster with news 1) | 0.217 | 0.191 | 0.089 | 0.119 | 0.204 | 0.168 | [0.162, 0.175] | <0.001 | 29% | 0% |
56 | Anthropic | Claude-2.1 (scratchpad with freeze values) | 0.236 | 0.140 | 0.084 | 0.101 | 0.188 | 0.168 | [0.162, 0.175] | <0.001 | 36% | 17% |
57 | OpenAI | GPT-4o (zero shot) | 0.227 | 0.196 | 0.075 | 0.111 | 0.212 | 0.169 | [0.162, 0.176] | <0.001 | 30% | 3% |
58 | Mistral AI | Mistral-Large-Latest (scratchpad) | 0.200 | 0.198 | 0.114 | 0.139 | 0.199 | 0.169 | [0.164, 0.175] | <0.001 | 25% | 0% |
59 | Meta | Llama-3-8b-Chat-Hf (zero shot with freeze values) | 0.221 | 0.149 | 0.107 | 0.119 | 0.185 | 0.170 | [0.163, 0.178] | <0.001 | 43% | 0% |
60 | Qwen | Qwen1.5-110B-Chat (scratchpad with news) | 0.208 | 0.181 | 0.112 | 0.133 | 0.194 | 0.170 | [0.165, 0.176] | <0.001 | 25% | 0% |
61 | OpenAI | GPT-4o (scratchpad with SECOND news) | 0.237 | 0.128 | 0.093 | 0.104 | 0.183 | 0.171 | [0.165, 0.176] | <0.001 | 28% | 6% |
62 | Gemini-1.5-Flash (scratchpad with news with freeze values) | 0.221 | 0.165 | 0.103 | 0.122 | 0.193 | 0.171 | [0.165, 0.178] | <0.001 | 28% | 0% | |
63 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with SECOND news) | 0.225 | 0.167 | 0.101 | 0.121 | 0.196 | 0.173 | [0.167, 0.179] | <0.001 | 28% | 1% |
64 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad) | 0.211 | 0.204 | 0.108 | 0.137 | 0.208 | 0.174 | [0.169, 0.18] | <0.001 | 27% | 0% |
65 | Qwen | Qwen1.5-110B-Chat (superforecaster with news 1) | 0.212 | 0.198 | 0.116 | 0.141 | 0.205 | 0.176 | [0.17, 0.183] | <0.001 | 30% | 15% |
66 | Qwen | Qwen1.5-110B-Chat (superforecaster with news 3) | 0.218 | 0.187 | 0.114 | 0.136 | 0.202 | 0.177 | [0.172, 0.182] | <0.001 | 24% | 3% |
67 | Meta | Llama-3-70b-Chat-Hf (scratchpad) | 0.217 | 0.196 | 0.112 | 0.137 | 0.206 | 0.177 | [0.172, 0.182] | <0.001 | 23% | 0% |
68 | Anthropic | Claude-2.1 (scratchpad with news) | 0.242 | 0.169 | 0.088 | 0.112 | 0.206 | 0.177 | [0.171, 0.183] | <0.001 | 34% | 17% |
69 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (zero shot with freeze values) | 0.216 | 0.203 | 0.112 | 0.139 | 0.210 | 0.178 | [0.17, 0.186] | <0.001 | 29% | 0% |
70 | Anthropic | Claude-3-Opus-20240229 (superforecaster with news 1) | 0.217 | 0.213 | 0.107 | 0.139 | 0.215 | 0.178 | [0.172, 0.184] | <0.001 | 27% | 9% |
71 | OpenAI | GPT-4 (zero shot) | 0.222 | 0.200 | 0.108 | 0.135 | 0.211 | 0.178 | [0.172, 0.184] | <0.001 | 26% | 0% |
72 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (zero shot) | 0.216 | 0.215 | 0.110 | 0.141 | 0.215 | 0.179 | [0.172, 0.186] | <0.001 | 27% | 0% |
73 | Anthropic | Claude-3-Opus-20240229 (scratchpad with news with freeze values) | 0.226 | 0.185 | 0.109 | 0.132 | 0.205 | 0.179 | [0.173, 0.185] | <0.001 | 24% | 0% |
74 | Anthropic | Claude-2.1 (scratchpad with news with freeze values) | 0.242 | 0.177 | 0.090 | 0.116 | 0.209 | 0.179 | [0.173, 0.185] | <0.001 | 33% | 14% |
75 | Gemini-1.5-Pro (superforecaster with news 1) | 0.223 | 0.188 | 0.116 | 0.137 | 0.205 | 0.180 | [0.173, 0.187] | <0.001 | 30% | 4% | |
76 | Gemini-1.5-Flash (scratchpad with news) | 0.221 | 0.193 | 0.118 | 0.140 | 0.207 | 0.181 | [0.175, 0.187] | <0.001 | 26% | 0% | |
77 | Gemini-1.5-Flash (superforecaster with news 2) | 0.243 | 0.144 | 0.110 | 0.120 | 0.194 | 0.182 | [0.176, 0.188] | <0.001 | 28% | 12% | |
78 | Mistral AI | Mistral-Large-Latest (zero shot) | 0.209 | 0.227 | 0.125 | 0.155 | 0.218 | 0.182 | [0.175, 0.189] | <0.001 | 27% | 0% |
79 | Qwen | Qwen1.5-110B-Chat (zero shot) | 0.228 | 0.207 | 0.109 | 0.138 | 0.217 | 0.183 | [0.177, 0.189] | <0.001 | 27% | 1% |
80 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad with news with freeze values) | 0.232 | 0.208 | 0.104 | 0.135 | 0.220 | 0.184 | [0.178, 0.19] | <0.001 | 26% | 0% |
81 | Anthropic | Claude-3-Opus-20240229 (superforecaster with news 3) | 0.219 | 0.201 | 0.127 | 0.149 | 0.210 | 0.184 | [0.178, 0.19] | <0.001 | 27% | 5% |
82 | Meta | Llama-3-8b-Chat-Hf (zero shot) | 0.221 | 0.184 | 0.132 | 0.148 | 0.203 | 0.184 | [0.176, 0.193] | <0.001 | 41% | 0% |
83 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 1) | 0.232 | 0.195 | 0.111 | 0.136 | 0.214 | 0.184 | [0.177, 0.191] | <0.001 | 30% | 18% |
84 | Anthropic | Claude-3-Opus-20240229 (scratchpad with news) | 0.226 | 0.198 | 0.123 | 0.145 | 0.212 | 0.186 | [0.18, 0.191] | <0.001 | 24% | 0% |
85 | Anthropic | Claude-3-Opus-20240229 (superforecaster with news 2) | 0.225 | 0.210 | 0.119 | 0.146 | 0.217 | 0.186 | [0.179, 0.192] | <0.001 | 26% | 1% |
86 | Anthropic | Claude-3-5-Sonnet-20240620 (superforecaster with news 2) | 0.224 | 0.208 | 0.123 | 0.148 | 0.216 | 0.186 | [0.179, 0.193] | <0.001 | 26% | 1% |
87 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad with news) | 0.232 | 0.206 | 0.114 | 0.141 | 0.219 | 0.187 | [0.182, 0.192] | <0.001 | 25% | 0% |
88 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 3) | 0.232 | 0.195 | 0.120 | 0.142 | 0.214 | 0.187 | [0.182, 0.192] | <0.001 | 26% | 10% |
89 | Gemini-1.5-Flash (superforecaster with news 3) | 0.231 | 0.207 | 0.127 | 0.151 | 0.219 | 0.191 | [0.185, 0.197] | <0.001 | 25% | 8% | |
90 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad with freeze values) | 0.251 | 0.187 | 0.109 | 0.133 | 0.219 | 0.192 | [0.185, 0.199] | <0.001 | 28% | 10% |
91 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad) | 0.251 | 0.185 | 0.111 | 0.133 | 0.218 | 0.192 | [0.186, 0.198] | <0.001 | 28% | 11% |
92 | Anthropic | Claude-2.1 (superforecaster with news 2) | 0.269 | 0.168 | 0.092 | 0.115 | 0.219 | 0.192 | [0.185, 0.199] | <0.001 | 37% | 23% |
93 | Gemini-1.5-Pro (superforecaster with news 2) | 0.242 | 0.181 | 0.127 | 0.143 | 0.212 | 0.193 | [0.185, 0.201] | <0.001 | 29% | 0% | |
94 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (zero shot with freeze values) | 0.267 | 0.175 | 0.096 | 0.119 | 0.221 | 0.193 | [0.185, 0.202] | <0.001 | 37% | 0% |
95 | Anthropic | Claude-2.1 (zero shot with freeze values) | 0.245 | 0.195 | 0.124 | 0.145 | 0.220 | 0.195 | [0.187, 0.203] | <0.001 | 32% | 0% |
96 | Qwen | Qwen1.5-110B-Chat (superforecaster with news 2) | 0.240 | 0.209 | 0.130 | 0.154 | 0.224 | 0.197 | [0.191, 0.203] | <0.001 | 24% | 4% |
97 | Gemini-1.5-Flash (zero shot) | 0.232 | 0.224 | 0.136 | 0.162 | 0.228 | 0.197 | [0.189, 0.205] | <0.001 | 31% | 1% | |
98 | OpenAI | GPT-4o (superforecaster with news 2) | 0.262 | 0.186 | 0.111 | 0.133 | 0.224 | 0.197 | [0.19, 0.204] | <0.001 | 28% | 4% |
99 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 3) | 0.275 | 0.150 | 0.109 | 0.121 | 0.213 | 0.198 | [0.193, 0.204] | <0.001 | 31% | 18% |
100 | OpenAI | GPT-4-Turbo-2024-04-09 (superforecaster with news 2) | 0.251 | 0.195 | 0.127 | 0.147 | 0.223 | 0.199 | [0.192, 0.207] | <0.001 | 28% | 7% |
101 | Anthropic | Claude-2.1 (superforecaster with news 1) | 0.268 | 0.183 | 0.109 | 0.131 | 0.226 | 0.200 | [0.193, 0.207] | <0.001 | 33% | 23% |
102 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 2) | 0.248 | 0.217 | 0.127 | 0.154 | 0.233 | 0.201 | [0.195, 0.207] | <0.001 | 24% | 2% |
103 | Mistral AI | Mistral-Large-Latest (scratchpad with news with freeze values) | 0.254 | 0.222 | 0.122 | 0.152 | 0.238 | 0.203 | [0.197, 0.209] | <0.001 | 22% | 0% |
104 | Mistral AI | Mistral-Large-Latest (superforecaster with news 1) | 0.250 | 0.219 | 0.134 | 0.159 | 0.234 | 0.205 | [0.197, 0.212] | <0.001 | 26% | 24% |
105 | Mistral AI | Mistral-Large-Latest (scratchpad with news) | 0.254 | 0.217 | 0.135 | 0.159 | 0.236 | 0.207 | [0.201, 0.213] | <0.001 | 22% | 0% |
106 | Mistral AI | Mistral-Large-Latest (superforecaster with news 2) | 0.239 | 0.234 | 0.151 | 0.176 | 0.236 | 0.207 | [0.2, 0.214] | <0.001 | 26% | 9% |
107 | Anthropic | Claude-2.1 (zero shot) | 0.245 | 0.242 | 0.150 | 0.178 | 0.244 | 0.211 | [0.205, 0.218] | <0.001 | 26% | 1% |
108 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad with news) | 0.287 | 0.182 | 0.117 | 0.136 | 0.235 | 0.212 | [0.206, 0.218] | <0.001 | 28% | 15% |
109 | Anthropic | Claude-2.1 (superforecaster with news 3) | 0.265 | 0.238 | 0.126 | 0.159 | 0.251 | 0.212 | [0.205, 0.219] | <0.001 | 27% | 10% |
110 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad with news with freeze values) | 0.287 | 0.178 | 0.121 | 0.138 | 0.233 | 0.213 | [0.206, 0.219] | <0.001 | 28% | 15% |
111 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 1) | 0.302 | 0.163 | 0.110 | 0.126 | 0.232 | 0.214 | [0.207, 0.22] | <0.001 | 32% | 19% |
112 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (zero shot) | 0.267 | 0.209 | 0.141 | 0.161 | 0.238 | 0.214 | [0.205, 0.223] | <0.001 | 33% | 1% |
113 | Gemini-1.5-Flash (superforecaster with news 1) | 0.249 | 0.248 | 0.155 | 0.182 | 0.248 | 0.215 | [0.208, 0.223] | <0.001 | 25% | 17% | |
114 | ForecastBench | Always 0.5 | 0.250 | 0.250 | 0.152 | 0.181 | 0.250 | 0.216 | [0.213, 0.219] | <0.001 | 25% | 0% |
115 | Meta | Llama-2-70b-Chat-Hf (scratchpad with freeze values) | 0.264 | 0.218 | 0.148 | 0.169 | 0.241 | 0.216 | [0.211, 0.222] | <0.001 | 22% | 0% |
116 | Anthropic | Claude-3-Haiku-20240307 (zero shot with freeze values) | 0.280 | 0.219 | 0.126 | 0.154 | 0.249 | 0.217 | [0.21, 0.223] | <0.001 | 24% | 1% |
117 | Meta | Llama-3-8b-Chat-Hf (scratchpad with freeze values) | 0.273 | 0.227 | 0.132 | 0.160 | 0.250 | 0.217 | [0.21, 0.223] | <0.001 | 23% | 0% |
118 | Meta | Llama-2-70b-Chat-Hf (scratchpad) | 0.264 | 0.227 | 0.149 | 0.172 | 0.245 | 0.218 | [0.212, 0.223] | <0.001 | 22% | 0% |
119 | Mistral AI | Mistral-Large-Latest (superforecaster with news 3) | 0.272 | 0.228 | 0.138 | 0.165 | 0.250 | 0.219 | [0.213, 0.225] | <0.001 | 22% | 4% |
120 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 2) | 0.296 | 0.210 | 0.122 | 0.148 | 0.253 | 0.222 | [0.215, 0.229] | <0.001 | 31% | 19% |
121 | Anthropic | Claude-3-Haiku-20240307 (scratchpad with freeze values) | 0.278 | 0.224 | 0.147 | 0.170 | 0.251 | 0.224 | [0.218, 0.23] | <0.001 | 20% | 1% |
122 | Anthropic | Claude-3-Haiku-20240307 (superforecaster with news 2) | 0.266 | 0.269 | 0.156 | 0.190 | 0.268 | 0.228 | [0.222, 0.234] | <0.001 | 22% | 5% |
123 | Anthropic | Claude-3-Haiku-20240307 (zero shot) | 0.280 | 0.248 | 0.148 | 0.178 | 0.264 | 0.229 | [0.223, 0.234] | <0.001 | 24% | 1% |
124 | ForecastBench | Always 0 | 0.268 | 0.170 | 0.200 | 0.191 | 0.219 | 0.229 | [0.217, 0.242] | <0.001 | 64% | 0% |
125 | Meta | Llama-3-8b-Chat-Hf (scratchpad) | 0.273 | 0.251 | 0.158 | 0.186 | 0.262 | 0.230 | [0.223, 0.236] | <0.001 | 22% | 0% |
126 | Anthropic | Claude-3-Haiku-20240307 (scratchpad) | 0.278 | 0.274 | 0.159 | 0.193 | 0.276 | 0.236 | [0.23, 0.241] | <0.001 | 20% | 1% |
127 | Anthropic | Claude-3-Haiku-20240307 (scratchpad with news with freeze values) | 0.306 | 0.249 | 0.149 | 0.179 | 0.278 | 0.243 | [0.237, 0.248] | <0.001 | 20% | 1% |
128 | Anthropic | Claude-3-Haiku-20240307 (superforecaster with news 3) | 0.288 | 0.282 | 0.174 | 0.206 | 0.285 | 0.247 | [0.241, 0.253] | <0.001 | 22% | 16% |
129 | Anthropic | Claude-3-Haiku-20240307 (scratchpad with news) | 0.306 | 0.271 | 0.161 | 0.193 | 0.289 | 0.250 | [0.244, 0.255] | <0.001 | 20% | 1% |
130 | OpenAI | GPT-3.5-Turbo-0125 (scratchpad with freeze values) | 0.299 | 0.289 | 0.170 | 0.206 | 0.294 | 0.252 | [0.246, 0.258] | <0.001 | 20% | 0% |
131 | OpenAI | GPT-3.5-Turbo-0125 (scratchpad) | 0.299 | 0.304 | 0.168 | 0.209 | 0.301 | 0.254 | [0.248, 0.26] | <0.001 | 20% | 0% |
132 | Meta | Llama-2-70b-Chat-Hf (zero shot with freeze values) | 0.305 | 0.280 | 0.188 | 0.215 | 0.293 | 0.260 | [0.251, 0.269] | <0.001 | 26% | 0% |
133 | Anthropic | Claude-3-Haiku-20240307 (superforecaster with news 1) | 0.300 | 0.333 | 0.197 | 0.237 | 0.317 | 0.269 | [0.261, 0.276] | <0.001 | 21% | 6% |
134 | Meta | Llama-2-70b-Chat-Hf (zero shot) | 0.305 | 0.328 | 0.206 | 0.242 | 0.316 | 0.274 | [0.266, 0.281] | <0.001 | 25% | 1% |
135 | ForecastBench | Random Uniform | 0.341 | 0.349 | 0.216 | 0.256 | 0.345 | 0.299 | [0.289, 0.308] | <0.001 | 28% | 0% |
136 | OpenAI | GPT-3.5-Turbo-0125 (zero shot with freeze values) | 0.453 | 0.266 | 0.167 | 0.196 | 0.359 | 0.324 | [0.314, 0.334] | <0.001 | 26% | 0% |
137 | OpenAI | GPT-3.5-Turbo-0125 (zero shot) | 0.453 | 0.364 | 0.237 | 0.275 | 0.408 | 0.364 | [0.355, 0.373] | <0.001 | 20% | 0% |
138 | ForecastBench | Always 1 | 0.732 | 0.830 | 0.604 | 0.671 | 0.781 | 0.702 | [0.687, 0.716] | <0.001 | 22% | 0% |