Chaque score, chaque source.
Donnees de benchmarks issues de sources officielles pour les modeles frontier. La vue actuelle masque les modeles remplaces et les anciennes versions. L archive garde l historique.
| Modele | Fournisseur | Entree $/M | Sortie $/M | SWE-bench Pro | Terminal-Bench 2.1 | MCP-Atlas | Toolathlon | AutomationBench | OSWorld-Verified | BrowseComp | GPQA Diamond | Humanity's Last Exam | Humanity's Last Exam with tools | FrontierMath T1-3 | ARC-AGI-2 | Finance Agent v2 | GDPval-AA | CyberGym |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GPT-5.5 | OpenAI | $5.00 | $30.00 | |||||||||||||||
GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | - | - | - | - | - | - | - | - | - | - | - | - | - | ||
Claude Opus 4.8 | Anthropic | $5.00 | $25.00 | - | - | - | - | |||||||||||
GPT-5.4 | OpenAI | $2.50 | $15.00 | - | - | - | - | - | - | |||||||||
Gemini 3.1 Pro Preview | $2.00 | $12.00 | - | |||||||||||||||
Gemini 3.1 Flash-Lite | $0.25 | $1.50 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | |
Gemini 3 Flash Preview | $0.50 | $3.00 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | |
DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
DeepSeek V4 Pro | DeepSeek | $0.43 | $0.87 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Grok 4.3 | xAI | $1.25 | $2.50 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
GPT-OSS 120B | OpenAI via Groq | $0.15 | $0.60 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Kimi K2.6 | Moonshot AI | $0.95 | $4.00 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
GLM-5.1 | Z.ai | $1.40 | $4.40 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Politique editoriale
Every score must cite an official provider, model-release, or benchmark-owner URL. When official data is missing, the cell is omitted rather than filled with an estimate. Current leaderboard rows hide archived models and superseded benchmark versions. Archive rows are kept for history and are clearly marked.
The default leaderboard shows current models and current benchmark versions only. The archive view keeps previous models and previous benchmark versions so older articles and historical comparisons remain traceable.
Analyses liees
Comparaisons completes construites sur ces donnees.