AI Benchmark Leaderboard
Every score, every source.
Official-source benchmark data for frontier AI models. Every number links to the source used. No estimates, no unsourced figures.
Last updated2026-05-12
Models tracked13
Benchmarks tracked12
| Model | Provider | Input $/M | Output $/M | SWE-bench Pro | Terminal-Bench 2.0 | MCP-Atlas | Toolathlon | OSWorld-Verified | BrowseComp | GPQA Diamond | FrontierMath T1-3 | ARC-AGI-2 | Finance Agent v1.1 | GDPval | CyberGym |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPT-5.5 | OpenAI | $5.00 | $30.00 | ||||||||||||
| GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | — | — | — | — | — | — | — | — | — | |||
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | — | |||||||||||
| GPT-5.4 | OpenAI | $2.50 | $15.00 | ||||||||||||
| Gemini 3.1 Pro | $2.00 | $12.00 | — | — | |||||||||||
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | — | — | — | — | — | — | — | — | — | — | — | — | |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | — | — | — | — | — | — | — | — | — | — | — | — |
| DeepSeek V4 Pro | DeepSeek | $0.43 | $0.87 | — | — | — | — | — | — | — | — | — | — | — | — |
| Grok 4 | xAI | $3.00 | $15.00 | — | — | — | — | — | — | — | — | — | — | — | — |
| Grok 4 Fast | xAI | $0.20 | $0.50 | — | — | — | — | — | — | — | — | — | — | — | — |
| Llama 4 Maverick | Meta via Groq | $0.50 | $0.77 | — | — | — | — | — | — | — | — | — | — | — | — |
| Kimi K2.6 | Moonshot AI | $0.95 | $4.00 | — | — | — | — | — | — | — | — | — | — | — | — |
| GLM 5 | Z.ai | $1.00 | $3.20 | — | — | — | — | — | — | — | — | — | — | — | — |
Official source URL
Comparative official source
Empty cells mean no official source was found. We do not estimate.
Editorial policy
Every score must cite an official provider, model-release, or benchmark-owner URL. When official data is missing, the cell is omitted rather than filled with an estimate. Prices use official provider pages where available; hosted open-weight pricing is sourced to the official host.
Related deep dives
Full comparison posts built on this data.