AI Benchmark Leaderboard

Every score, every source.

Official-source benchmark data for frontier AI models. Every number links to the source used. No estimates, no unsourced figures.

Last updated2026-05-12
Models tracked13
Benchmarks tracked12
ModelProviderInput $/MOutput $/M
SWE-bench Pro
Terminal-Bench 2.0
MCP-Atlas
Toolathlon
OSWorld-Verified
BrowseComp
GPQA Diamond
FrontierMath T1-3
ARC-AGI-2
Finance Agent v1.1
GDPval
CyberGym
GPT-5.5OpenAI$5.00$30.00
GPT-5.5 ProOpenAI$30.00$180.00
Claude Opus 4.7Anthropic$5.00$25.00
GPT-5.4OpenAI$2.50$15.00
Gemini 3.1 ProGoogle$2.00$12.00
Gemini 3.1 Flash-LiteGoogle$0.25$1.50
DeepSeek V4 FlashDeepSeek$0.14$0.28
DeepSeek V4 ProDeepSeek$0.43$0.87
Grok 4xAI$3.00$15.00
Grok 4 FastxAI$0.20$0.50
Llama 4 MaverickMeta via Groq$0.50$0.77
Kimi K2.6Moonshot AI$0.95$4.00
GLM 5Z.ai$1.00$3.20
Official source URL
Comparative official source
Empty cells mean no official source was found. We do not estimate.

Editorial policy

Every score must cite an official provider, model-release, or benchmark-owner URL. When official data is missing, the cell is omitted rather than filled with an estimate. Prices use official provider pages where available; hosted open-weight pricing is sourced to the official host.

Related deep dives

Full comparison posts built on this data.