🏆 模型评估排行榜

V-GameGym Benchmark Performance

📁 Resources & Links

70
模型总数
32.38
平均得分
44.99
最高得分
2219
游戏总数
排名 模型 总分 代码分 截图分 视频分 分数分布 (优秀/良好/及格/差) 游戏数
🥇 gpt-5 44.99 96.58 17.64 20.74
288
676
1172
2219
🥈 o3 44.76 92.26 20.17 21.85
341
686
1127
2219
🥉 gpt-5-mini 43.48 96.69 15.71 18.02
236
655
1267
2219
4 gemini-2.5-pro 43.47 89.11 19.12 22.17
337
672
1165
2219
5 gpt-oss-120b 43.36 90.09 19.70 20.28
324
634
1209
2219
6 o4-mini 42.99 87.77 19.79 21.39
313
682
1188
2219
7 gpt-4.1-2025-04-14 42.49 91.78 17.58 18.11
263
641
1268
2219
8 Qwen3-235B-A22B-Thinking-2507 42.28 84.47 20.00 22.38
322
695
1180
2219
9 gpt-oss-20b 42.19 88.75 18.63 19.17
299
626
1263
2219
10 grok-4 42.01 83.85 19.79 22.39
327
650
1221
2219
11 gemini-2.5-flash 41.99 92.79 16.51 16.67
252
634
1304
2218
12 Qwen3-Coder-480B-A35B-Instruct 41.35 85.26 18.32 20.46
287
644
1268
2219
13 chatgpt-4o-latest 41.24 82.51 19.90 21.33
305
613
1276
2219
14 Qwen3-235B-A22B 41.23 81.27 19.79 22.64
302
668
1235
2219
15 DeepSeek-V3-0324 41.15 83.65 19.26 20.53
311
638
1248
2219
16 Qwen3-235B-A22B-Instruct-2507 41.08 85.34 18.21 19.70
308
589
1306
2219
17 DeepSeek-V3.1 40.85 83.08 19.27 20.20
296
611
1287
2219
18 claude-sonnet-4-20250514-thinking 40.51 90.27 14.38 16.88
204
624
1355
2219
19 Qwen3-32B 40.38 81.58 18.92 20.64
274
642
1295
2219
20 Seed-OSS-36B-Instruct 40.31 88.28 16.42 16.23
234
585
1375
2219
21 claude-sonnet-4-20250514 40.24 87.68 15.66 17.38
207
616
1360
2219
22 DeepSeek-R1 40.10 81.00 19.22 20.07
278
607
1319
2219
23 GLM-4.5 40.01 84.72 16.98 18.32
216
661
1311
2219
24 Qwen3-30B-A3B-Thinking-2507 39.99 80.65 18.93 20.39
279
589
1338
2219
25 QwQ-32B 39.60 79.69 18.55 20.57
268
610
1331
2219
26 Qwen3-30B-A3B 39.57 78.36 19.66 20.68
274
597
1339
2219
27 GLM-4.5-Air 39.41 85.41 16.31 16.52
230
533
1433
2219
28 Qwen3-Coder-30B-A3B-Instruct 39.02 83.78 16.59 16.70
226
543
1428
2219
29 Qwen3-14B 38.78 79.09 18.41 18.84
245
567
1398
2219
30 DeepSeek-R1-0528 38.71 88.07 13.44 14.61
174
544
1469
2219
31 Qwen3-30B-A3B-Instruct-2507 38.57 81.36 16.54 17.83
223
571
1414
2219
32 o3-mini-2025-01-31 38.18 89.29 11.91 13.34
204
417
1572
2219
33 gpt-4o-2024-11-20 37.59 76.62 17.51 18.64
224
531
1452
2219
34 Qwen3-8B 36.91 76.17 17.24 17.31
187
546
1480
2218
35 DeepSeek-V3 36.65 73.37 17.65 18.94
204
564
1447
2218
36 DeepSeek-R1-Distill-Llama-70B 35.28 74.06 15.76 16.04
188
448
1579
2219
37 Zhihu-ai-Zhi-Create-Qwen3-32B 35.15 75.83 15.19 14.42
184
426
1606
2219
38 Qwen2.5-72B-Instruct 34.58 73.22 14.66 15.87
174
449
1593
2219
39 Qwen2.5-Coder-32B-Instruct 34.42 74.55 13.77 14.94
167
425
1618
2219
40 Qwen3-4B 34.41 72.66 15.11 15.46
144
464
1610
2219
41 Qwen3-4B-Thinking-2507 34.28 69.98 16.13 16.75
168
465
1584
2219
42 gpt-4o-mini-2024-07-18 33.90 70.36 15.52 15.81
134
459
1622
2219
43 Seed-Coder-8B-Instruct 33.87 73.22 13.98 14.41
137
422
1656
2219
44 DeepSeek-R1-Distill-Qwen-32B 33.40 71.87 14.39 13.95
145
411
1663
2219
45 Qwen2.5-32B-Instruct 31.84 66.38 14.04 15.11
127
389
1701
2219
46 Qwen2.5-14B-Instruct 30.27 66.42 11.43 12.97
348
1779
2219
47 Qwen2.5-Coder-14B-Instruct 30.18 68.50 10.89 11.17
327
1804
2218
48 Llama-3.1-8B-Instruct 29.39 62.86 12.96 12.34
319
1815
2219
49 Qwen2.5-Coder-7B-Instruct 27.58 63.91 9.09 9.73
250
1899
2218
50 DeepSeek-R1-Distill-Qwen-14B 27.43 65.33 8.68 8.28
198
1943
2219
51 Qwen2.5-7B-Instruct 26.09 59.81 9.22 9.24
230
1937
2219
52 Qwen3-1.7B 25.03 57.32 9.09 8.69
216
1959
2218
53 Llama-3.2-3B-Instruct 24.50 55.51 9.49 8.50
210
1963
2219
54 deepseek-coder-7b-instruct-v1.5 24.00 53.78 9.02 9.21
229
1952
2219
55 kimi-k2-0905-preview 23.47 66.26 1.95 2.21
2135
2215
56 Hunyuan-7B-Instruct 20.96 57.80 2.84 2.25
2148
2219
57 Qwen2.5-Coder-3B-Instruct 20.36 53.81 4.37 2.89
2124
2219
58 OpenCoder-8B-Instruct 20.11 49.02 6.55 4.75
152
2060
2219
59 Qwen2.5-3B-Instruct 18.70 46.92 5.04 4.14
2116
2219
60 Qwen2.5-Coder-1.5B-Instruct 17.26 46.59 3.95 1.24
2159
2219
61 OpenCoder-1.5B-Instruct 16.86 43.09 5.61 1.87
2108
2219
62 Llama-3.2-1B-Instruct 15.51 40.28 4.33 1.92
2158
2219
63 Qwen2.5-1.5B-Instruct 15.19 40.07 3.73 1.78
2154
2219
64 DeepSeek-R1-Distill-Llama-8B 15.08 42.45 1.74 1.04
2187
2219
65 DeepSeek-R1-0528-Qwen3-8B 14.02 41.29 0.43 0.33
2207
2218
66 Qwen3-0.6B 13.67 35.11 4.82 1.07
2160
2219
67 Qwen2.5-Coder-0.5B-Instruct 12.75 34.64 3.57 0.04
2179
2218
68 DeepSeek-R1-Distill-Qwen-7B 12.07 35.78 0.30 0.12
2216
2219
69 Qwen2.5-0.5B-Instruct 10.86 30.81 1.73 0.04
2200
2216
70 DeepSeek-R1-Distill-Qwen-1.5B 8.48 25.40 0.04 0.00
2218
2219