V-GameGym Benchmark Performance
排名 | 模型 | 总分 | 代码分 | 截图分 | 视频分 | 分数分布 (优秀/良好/及格/差) | 游戏数 |
---|---|---|---|---|---|---|---|
🥇 | gpt-5 | 44.99 | 96.58 | 17.64 | 20.74 | 2219 | |
🥈 | o3 | 44.76 | 92.26 | 20.17 | 21.85 | 2219 | |
🥉 | gpt-5-mini | 43.48 | 96.69 | 15.71 | 18.02 | 2219 | |
4 | gemini-2.5-pro | 43.47 | 89.11 | 19.12 | 22.17 | 2219 | |
5 | gpt-oss-120b | 43.36 | 90.09 | 19.70 | 20.28 | 2219 | |
6 | o4-mini | 42.99 | 87.77 | 19.79 | 21.39 | 2219 | |
7 | gpt-4.1-2025-04-14 | 42.49 | 91.78 | 17.58 | 18.11 | 2219 | |
8 | Qwen3-235B-A22B-Thinking-2507 | 42.28 | 84.47 | 20.00 | 22.38 | 2219 | |
9 | gpt-oss-20b | 42.19 | 88.75 | 18.63 | 19.17 | 2219 | |
10 | grok-4 | 42.01 | 83.85 | 19.79 | 22.39 | 2219 | |
11 | gemini-2.5-flash | 41.99 | 92.79 | 16.51 | 16.67 | 2218 | |
12 | Qwen3-Coder-480B-A35B-Instruct | 41.35 | 85.26 | 18.32 | 20.46 | 2219 | |
13 | chatgpt-4o-latest | 41.24 | 82.51 | 19.90 | 21.33 | 2219 | |
14 | Qwen3-235B-A22B | 41.23 | 81.27 | 19.79 | 22.64 | 2219 | |
15 | DeepSeek-V3-0324 | 41.15 | 83.65 | 19.26 | 20.53 | 2219 | |
16 | Qwen3-235B-A22B-Instruct-2507 | 41.08 | 85.34 | 18.21 | 19.70 | 2219 | |
17 | DeepSeek-V3.1 | 40.85 | 83.08 | 19.27 | 20.20 | 2219 | |
18 | claude-sonnet-4-20250514-thinking | 40.51 | 90.27 | 14.38 | 16.88 | 2219 | |
19 | Qwen3-32B | 40.38 | 81.58 | 18.92 | 20.64 | 2219 | |
20 | Seed-OSS-36B-Instruct | 40.31 | 88.28 | 16.42 | 16.23 | 2219 | |
21 | claude-sonnet-4-20250514 | 40.24 | 87.68 | 15.66 | 17.38 | 2219 | |
22 | DeepSeek-R1 | 40.10 | 81.00 | 19.22 | 20.07 | 2219 | |
23 | GLM-4.5 | 40.01 | 84.72 | 16.98 | 18.32 | 2219 | |
24 | Qwen3-30B-A3B-Thinking-2507 | 39.99 | 80.65 | 18.93 | 20.39 | 2219 | |
25 | QwQ-32B | 39.60 | 79.69 | 18.55 | 20.57 | 2219 | |
26 | Qwen3-30B-A3B | 39.57 | 78.36 | 19.66 | 20.68 | 2219 | |
27 | GLM-4.5-Air | 39.41 | 85.41 | 16.31 | 16.52 | 2219 | |
28 | Qwen3-Coder-30B-A3B-Instruct | 39.02 | 83.78 | 16.59 | 16.70 | 2219 | |
29 | Qwen3-14B | 38.78 | 79.09 | 18.41 | 18.84 | 2219 | |
30 | DeepSeek-R1-0528 | 38.71 | 88.07 | 13.44 | 14.61 | 2219 | |
31 | Qwen3-30B-A3B-Instruct-2507 | 38.57 | 81.36 | 16.54 | 17.83 | 2219 | |
32 | o3-mini-2025-01-31 | 38.18 | 89.29 | 11.91 | 13.34 | 2219 | |
33 | gpt-4o-2024-11-20 | 37.59 | 76.62 | 17.51 | 18.64 | 2219 | |
34 | Qwen3-8B | 36.91 | 76.17 | 17.24 | 17.31 | 2218 | |
35 | DeepSeek-V3 | 36.65 | 73.37 | 17.65 | 18.94 | 2218 | |
36 | DeepSeek-R1-Distill-Llama-70B | 35.28 | 74.06 | 15.76 | 16.04 | 2219 | |
37 | Zhihu-ai-Zhi-Create-Qwen3-32B | 35.15 | 75.83 | 15.19 | 14.42 | 2219 | |
38 | Qwen2.5-72B-Instruct | 34.58 | 73.22 | 14.66 | 15.87 | 2219 | |
39 | Qwen2.5-Coder-32B-Instruct | 34.42 | 74.55 | 13.77 | 14.94 | 2219 | |
40 | Qwen3-4B | 34.41 | 72.66 | 15.11 | 15.46 | 2219 | |
41 | Qwen3-4B-Thinking-2507 | 34.28 | 69.98 | 16.13 | 16.75 | 2219 | |
42 | gpt-4o-mini-2024-07-18 | 33.90 | 70.36 | 15.52 | 15.81 | 2219 | |
43 | Seed-Coder-8B-Instruct | 33.87 | 73.22 | 13.98 | 14.41 | 2219 | |
44 | DeepSeek-R1-Distill-Qwen-32B | 33.40 | 71.87 | 14.39 | 13.95 | 2219 | |
45 | Qwen2.5-32B-Instruct | 31.84 | 66.38 | 14.04 | 15.11 | 2219 | |
46 | Qwen2.5-14B-Instruct | 30.27 | 66.42 | 11.43 | 12.97 | 2219 | |
47 | Qwen2.5-Coder-14B-Instruct | 30.18 | 68.50 | 10.89 | 11.17 | 2218 | |
48 | Llama-3.1-8B-Instruct | 29.39 | 62.86 | 12.96 | 12.34 | 2219 | |
49 | Qwen2.5-Coder-7B-Instruct | 27.58 | 63.91 | 9.09 | 9.73 | 2218 | |
50 | DeepSeek-R1-Distill-Qwen-14B | 27.43 | 65.33 | 8.68 | 8.28 | 2219 | |
51 | Qwen2.5-7B-Instruct | 26.09 | 59.81 | 9.22 | 9.24 | 2219 | |
52 | Qwen3-1.7B | 25.03 | 57.32 | 9.09 | 8.69 | 2218 | |
53 | Llama-3.2-3B-Instruct | 24.50 | 55.51 | 9.49 | 8.50 | 2219 | |
54 | deepseek-coder-7b-instruct-v1.5 | 24.00 | 53.78 | 9.02 | 9.21 | 2219 | |
55 | kimi-k2-0905-preview | 23.47 | 66.26 | 1.95 | 2.21 | 2215 | |
56 | Hunyuan-7B-Instruct | 20.96 | 57.80 | 2.84 | 2.25 | 2219 | |
57 | Qwen2.5-Coder-3B-Instruct | 20.36 | 53.81 | 4.37 | 2.89 | 2219 | |
58 | OpenCoder-8B-Instruct | 20.11 | 49.02 | 6.55 | 4.75 | 2219 | |
59 | Qwen2.5-3B-Instruct | 18.70 | 46.92 | 5.04 | 4.14 | 2219 | |
60 | Qwen2.5-Coder-1.5B-Instruct | 17.26 | 46.59 | 3.95 | 1.24 | 2219 | |
61 | OpenCoder-1.5B-Instruct | 16.86 | 43.09 | 5.61 | 1.87 | 2219 | |
62 | Llama-3.2-1B-Instruct | 15.51 | 40.28 | 4.33 | 1.92 | 2219 | |
63 | Qwen2.5-1.5B-Instruct | 15.19 | 40.07 | 3.73 | 1.78 | 2219 | |
64 | DeepSeek-R1-Distill-Llama-8B | 15.08 | 42.45 | 1.74 | 1.04 | 2219 | |
65 | DeepSeek-R1-0528-Qwen3-8B | 14.02 | 41.29 | 0.43 | 0.33 | 2218 | |
66 | Qwen3-0.6B | 13.67 | 35.11 | 4.82 | 1.07 | 2219 | |
67 | Qwen2.5-Coder-0.5B-Instruct | 12.75 | 34.64 | 3.57 | 0.04 | 2218 | |
68 | DeepSeek-R1-Distill-Qwen-7B | 12.07 | 35.78 | 0.30 | 0.12 | 2219 | |
69 | Qwen2.5-0.5B-Instruct | 10.86 | 30.81 | 1.73 | 0.04 | 2216 | |
70 | DeepSeek-R1-Distill-Qwen-1.5B | 8.48 | 25.40 | 0.04 | 0.00 | 2219 |