🏆 Model Evaluation Leaderboard

Comprehensive Performance Analysis
Generated: 2025-10-14 08:32:49

📁 Resources & Links

72
Total Models
53.36
Average Score
83.25
Best Score
2219
Total Games
📊 Performance Rankings
Rank Model Final Score Code Screenshot Video Score Distribution Total Games
1 gpt-5 83.25 96.64 77.89 75.21
1958
125
124
2218
2 gpt-5-mini 80.78 96.71 74.31 71.32
1797
208
176
2219
3 claude-sonnet-4-20250514-thinking 79.86 90.35 75.95 73.29
1600
477
121
2219
4 claude-sonnet-4-20250514 78.55 87.83 75.03 72.80
1380
694
2219
5 claude-sonnet-4-5-20250929 78.10 87.64 74.48 72.20
1463
587
148
2217
6 Qwen3-Coder-480B-A35B-Instruct 76.44 85.29 73.05 70.98
1130
903
128
2219
7 gpt-4.1-2025-04-14 76.01 91.76 69.68 66.60
1455
452
254
2219
8 qwen3-coder-plus 75.86 84.70 72.36 70.52
1116
890
133
2204
9 DeepSeek-R1-0528 75.62 87.52 72.33 67.01
1456
436
221
2219
10 o3 75.44 92.26 68.10 65.95
1214
729
122
154
2219
11 gemini-2.5-pro 74.62 89.14 67.98 66.75
1200
713
224
2219
12 qwen3-max 74.53 85.05 70.38 68.15
1022
951
155
2219
13 deepseek-v3.2-exp 73.75 84.28 69.57 67.39
907
1046
120
145
2218
14 Qwen3-235B-A22B-Thinking-2507 73.05 84.57 68.01 66.57
796
1140
126
156
2218
15 Qwen3-Coder-30B-A3B-Instruct 72.78 83.80 69.03 65.52
896
987
163
173
2219
16 GLM-4.5 72.18 84.20 67.68 64.66
1022
851
244
2219
17 o4-mini 71.97 87.83 64.99 63.10
888
948
184
199
2219
18 gpt-oss-120b 71.19 90.07 63.05 60.45
949
827
176
267
2219
19 gemini-2.5-flash 70.90 92.91 61.44 58.35
1025
715
182
296
2218
20 Qwen3-235B-A22B-Instruct-2507 70.53 85.39 64.23 61.97
843
953
129
294
2219
21 qwen3-next-80b-a3b-instruct 69.67 83.95 63.66 61.40
801
978
146
294
2219
22 DeepSeek-V3-0324 69.28 83.03 64.18 60.62
660
1100
207
252
2219
23 Moonshot-Kimi-K2-Instruct 69.11 84.14 62.59 60.62
781
983
137
318
2219
24 Qwen3-235B-A22B 68.44 81.36 62.16 61.79
420
1338
277
184
2219
25 chatgpt-4o-latest 68.41 82.62 62.58 60.04
550
1175
252
241
2218
26 gpt-oss-20b 68.26 88.83 58.77 57.17
765
904
212
338
2219
27 Qwen3-30B-A3B-Thinking-2507 67.65 80.83 61.68 60.44
458
1264
265
232
2219
28 GLM-4.5-Air 67.47 84.80 60.79 56.80
740
927
180
372
2219
29 qwen3-next-80b-a3b-thinking 66.96 77.68 62.10 61.10
355
1352
293
219
2219
30 Qwen3-32B 66.36 81.73 59.34 58.02
404
1236
306
273
2219
31 Qwen3-30B-A3B-Instruct-2507 65.87 81.43 59.07 57.13
560
1072
205
382
2219
32 DeepSeek-R1 64.64 80.46 57.56 55.88
406
1162
296
355
2219
33 Qwen3-30B-A3B 62.72 78.57 55.66 53.93
233
1242
418
326
2219
34 QwQ-32B 62.56 79.18 54.84 53.65
318
1196
290
415
2219
35 Qwen3-14B 61.90 79.22 54.60 51.86
262
1170
395
392
2219
36 gpt-4o-2024-11-20 60.65 76.85 53.23 51.87
244
1136
415
424
2219
37 Zhihu-ai-Zhi-Create-Qwen3-32B 58.63 75.95 52.14 47.80
429
864
233
562
2088
38 DeepSeek-V3 58.07 73.06 51.44 49.72
149
1129
464
477
2219
39 Qwen3-8B 57.63 76.32 49.25 47.32
158
1064
484
513
2219
40 o3-mini-2025-01-31 56.00 89.32 40.03 38.65
574
563
996
2218
41 DeepSeek-R1-Distill-Llama-70B 55.13 73.73 47.03 44.64
137
1042
398
642
2219
42 Qwen3-4B-Thinking-2507 52.96 70.20 45.19 43.50
989
497
666
2218
43 Qwen2.5-72B-Instruct 52.86 72.86 43.87 41.86
119
992
354
753
2218
44 Qwen3-4B 52.44 72.95 42.88 41.48
910
501
712
2218
45 gpt-4o-mini-2024-07-18 51.94 70.56 43.58 41.70
952
484
724
2219
46 Seed-Coder-8B-Instruct 51.32 73.37 41.35 39.23
893
442
783
2219
47 DeepSeek-R1-Distill-Qwen-32B 51.30 71.63 42.47 39.82
116
958
323
822
2219
48 Qwen2.5-Coder-32B-Instruct 50.86 74.03 40.26 38.28
139
880
330
870
2219
49 Qwen2.5-32B-Instruct 47.99 65.89 39.76 38.32
822
479
870
2219
50 Llama-3.1-8B-Instruct 43.91 62.49 35.90 33.33
678
507
1009
2218
51 Qwen2.5-14B-Instruct 43.87 66.12 33.77 31.70
657
403
1024
2123
52 Qwen2.5-Coder-14B-Instruct 43.73 68.28 32.94 29.99
688
358
1122
2219
53 Qwen2.5-Coder-7B-Instruct 37.35 64.06 25.42 22.59
454
373
1369
2216
54 DeepSeek-R1-Distill-Qwen-14B 36.95 64.76 24.25 21.83
512
255
1428
2219
55 Qwen2.5-7B-Instruct 36.21 60.18 25.30 23.17
428
379
1398
2219
56 Qwen3-1.7B 33.07 57.01 22.56 19.65
325
366
1440
2138
57 Llama-3.2-3B-Instruct 32.90 55.07 23.82 19.83
340
379
1487
2219
58 deepseek-coder-7b-instruct-v1.5 32.21 53.85 22.60 20.17
317
361
1531
2219
59 OpenCoder-8B-Instruct 25.40 48.61 16.00 11.58
145
296
1771
2219
60 Qwen2.5-Coder-3B-Instruct 24.28 53.38 12.50 6.95
136
200
1879
2219
61 Hunyuan-7B-Instruct 23.93 58.04 7.87 5.89
121
1977
2219
62 Qwen2.5-3B-Instruct 23.25 46.67 12.97 10.09
116
232
1849
2201
63 Qwen2.5-Coder-1.5B-Instruct 19.19 46.38 8.88 2.31
142
2045
2219
64 Llama-3.2-1B-Instruct 17.18 39.50 8.28 3.75
2075
2219
65 Qwen2.5-1.5B-Instruct 16.79 39.89 7.10 3.39
2087
2219
66 DeepSeek-R1-Distill-Llama-8B 15.94 41.70 3.74 2.37
2130
2219
67 DeepSeek-R1-0528-Qwen3-8B 14.17 40.54 1.08 0.88
2184
2219
68 Qwen3-0.6B 14.14 34.84 6.04 1.52
2127
2219
69 Qwen2.5-Coder-0.5B-Instruct 13.07 34.75 4.32 0.16
2172
2219
70 DeepSeek-R1-Distill-Qwen-7B 12.32 35.85 0.77 0.33
2204
2218
71 Qwen2.5-0.5B-Instruct 11.13 31.14 2.03 0.22
2198
2218
72 DeepSeek-R1-Distill-Qwen-1.5B 8.63 25.85 0.04 0.00
2219
2219