Short version: Checked May 10, 2026: if you need one default, start with the model that matches the job. Claude Opus 4.7 is the safest pick for difficult coding and long-running professional work. GPT-5.2 is the current official OpenAI comparison point, not GPT-5.5. Gemini 3.1 Pro is strongest when research, multimodal input, and long-context reasoning matter. DeepSeek V4 Flash is the cost default for 1M-context API work, while V4 Pro is worth testing for harder tasks during its temporary discount.
- Anthropic says Claude Opus 4.7 launched April 16, 2026 and keeps Opus 4.6 pricing: $5 input and $25 output per 1M tokens
- OpenAI's official pricing page lists GPT-5.2 at $1.75 input, $0.175 cached input, and $14 output per 1M tokens
- OpenAI's official pricing page does not list GPT-5.5 as of this update
- Google lists Gemini 3.1 Pro Preview at $2/$12 per 1M tokens up to 200K prompts, and $4/$18 above 200K prompts
- Google's Gemini 3.1 Pro model page lists 1M input tokens, 64K output tokens, and strong GPQA, BrowseComp, SWE-Bench, and Terminal-Bench results
- DeepSeek V4 Flash is listed at $0.14 cache-miss input and $0.28 output per 1M tokens
- DeepSeek V4 Pro is discounted 75% through May 31, 2026: $0.435 cache-miss input and $0.87 output per 1M tokens
- DeepSeek says deepseek-chat and deepseek-reasoner retire after July 24, 2026 at 15:59 UTC
This page keeps the old slug because that is the URL Google already knows. The content needed a correction: OpenAI's current public pricing page does not list GPT-5.5, so this update uses GPT-5.2 for the official OpenAI comparison.
I checked official pages from Anthropic, OpenAI, Google, and DeepSeek. Where a provider makes benchmark claims, I treat them as provider claims, not independent proof. The goal here is not to crown a universal winner. The goal is to tell you which model to test first for the job in front of you.
Short answer
Pick based on workload, not brand.
Best model by use case
| Use case | First model to test | Reason |
|---|---|---|
| Difficult coding and sustained software work | Claude Opus 4.7 | Anthropic positions it as the stronger Opus model for coding, agents, and professional tasks |
| OpenAI ecosystem or Codex-style workflows | GPT-5.2 or GPT-5.2 Codex | Current official OpenAI pricing page lists GPT-5.2 and GPT-5.2 Codex, not GPT-5.5 |
| Research, multimodal reasoning, and Google tools | Gemini 3.1 Pro | Google lists strong GPQA, BrowseComp, SWE-Bench, and 1M-context support |
| Lowest-cost 1M-context API work | DeepSeek V4 Flash | DeepSeek's official pricing is far lower than the Western flagship options |
| Open-weight DeepSeek experiments | DeepSeek V4 Pro or Flash | DeepSeek links open weights from the official V4 release note |
Why this uses GPT-5.2, not GPT-5.5
The old article overcommitted to a model name the official page does not currently list.
The previous version talked about GPT-5.5 pricing and benchmarks as if they were settled official facts. That is not safe anymore. OpenAI's official pricing page currently lists GPT-5.2, GPT-5.1, GPT-5, GPT-5.2 Codex, GPT-5.1 Codex, and related models. It does not list GPT-5.5.
Correction made
The URL still contains gpt-5-5, but the article now uses GPT-5.2 as the current official OpenAI reference. That avoids publishing unsupported GPT-5.5 price claims while preserving the existing URL.
Not sure which AI model to use?
12 models · Personalized picks · 60 seconds
Current pricing snapshot
Official public pages, checked May 10, 2026.
API pricing per 1M tokens
| Model | Input | Cached input | Output | Notes |
|---|---|---|---|---|
| Claude Opus 4.7 | $5.00 | Not listed in announcement | $25.00 | Anthropic says same pricing as Opus 4.6 |
| GPT-5.2 | $1.75 | $0.175 | $14.00 | OpenAI official pricing page |
| GPT-5.2 Pro | $21.00 | - | $168.00 | OpenAI premium Pro tier |
| Gemini 3.1 Pro Preview, <=200K prompt | $2.00 | $0.20 context cache | $12.00 | Google Gemini API standard pricing |
| Gemini 3.1 Pro Preview, >200K prompt | $4.00 | $0.40 context cache | $18.00 | Google Gemini API standard pricing |
| DeepSeek V4 Flash | $0.0028 cache hit / $0.14 cache miss | - | $0.28 | DeepSeek official pricing |
| DeepSeek V4 Pro discount | $0.003625 cache hit / $0.435 cache miss | - | $0.87 | 75% discount through May 31, 2026 |
For pure API cost, DeepSeek V4 Flash wins by a lot. For mainstream hosted frontier work, GPT-5.2 is cheaper than Opus 4.7 on list price. Gemini 3.1 Pro becomes more expensive once prompts cross 200K tokens, but it brings the Google stack, 1M input, and strong official benchmark claims.
Benchmark claims
Useful, but still provider-reported.
Officially listed strengths
| Model | Provider-reported strengths | Caution |
|---|---|---|
| Claude Opus 4.7 | Anthropic describes gains in coding, vision, multi-step work, instruction following, and professional tasks | Run a prompt migration test because stricter instruction following can change behavior |
| GPT-5.2 | OpenAI pricing page confirms current public availability and Codex variants | Use official OpenAI evals for benchmarks; do not carry over old GPT-5.5 claims |
| Gemini 3.1 Pro | Google lists GPQA Diamond 94.3%, SWE-Bench Verified 80.6%, Terminal-Bench 68.5%, BrowseComp 85.9% | Preview model behavior and pricing can change |
| DeepSeek V4 Pro | DeepSeek says V4 Pro leads current open models in world knowledge and beats current open models in Math/STEM/Coding | Detailed text benchmark tables are limited on the official release page |
| DeepSeek V4 Flash | DeepSeek says Flash closely approaches V4 Pro reasoning and matches it on simple agent tasks | Test on your own tasks before replacing a higher-end model |
Coding and agents
Claude for hard coding, DeepSeek for cheap worker calls, GPT-5.2 for OpenAI workflows.
For difficult coding, I would still start with Claude Opus 4.7 if budget allows. Anthropic's release is explicitly aimed at advanced software engineering, long-running tasks, better instruction following, and professional work. That makes it the safest high-end first test for large codebases.
If your workflow already sits inside OpenAI tooling, use GPT-5.2 or GPT-5.2 Codex as the current official reference. The old GPT-5.5 claims should not drive buying or routing decisions unless OpenAI publishes that model in official docs.
If you are building a routing system, DeepSeek V4 Flash is the obvious cheap worker candidate. Use it for implementation-layer tasks, long-context extraction, and high-volume agent calls. Escalate to DeepSeek V4 Pro or a Western flagship when the task needs stronger planning, judgment, or reliability.
Research and long context
Gemini and DeepSeek both matter, for different reasons.
Gemini 3.1 Pro has the strongest official research-style profile in this group: Google lists 94.3% on GPQA Diamond, 85.9% on BrowseComp with search and Python, 1M input tokens, and 64K output tokens. That makes it a strong first test for research, multimodal, and Google-integrated workloads.
DeepSeek V4 is the long-context cost outlier. If your workload is mostly "read a lot, extract, summarize, transform," V4 Flash deserves the first benchmark run because the price difference is large enough to change product economics.
Official sources checked
Primary sources only for the factual refresh.
- Anthropic: Introducing Claude Opus 4.7
- Anthropic: Claude Opus 4.7 model page
- OpenAI API pricing
- Google DeepMind: Gemini 3.1 Pro
- Google Gemini API pricing
- DeepSeek V4 Preview Release
- DeepSeek Models & Pricing
The bottom line
There is no universal winner.
If you need a simple rule: Claude Opus 4.7 for hard coding and professional work, GPT-5.2 for current OpenAI workflows, Gemini 3.1 Pro for research and multimodal long-context tasks, and DeepSeek V4 Flash for low-cost 1M-context API volume.
The old GPT-5.5 framing was the main risk in this post. That is now corrected. For the DeepSeek-only migration details, read the DeepSeek V4 guide. For a broader workflow picker, use our AI Model Picker. To compare monthly spend across these models, use the AI cost calculator.
Keep Reading
Stay ahead of the AI curve
We test new AI tools every week and share honest results. Join our newsletter.