There is no single best open-source coding model in 2026, and anyone who hands you one number is hiding something. DeepSeek V4 is the cheapest serious option with a 1M context window and an official "state-of-the-art agentic coding" claim. MiniMax M3 reports the highest raw coding score (59% on SWE-Bench Pro), but that number is vendor-run and its weights only just shipped. Kimi K2.7-Code is purpose-built for coding agents and is the cleanest to self-host, under a Modified MIT license. The catch: all three report different benchmarks, so you cannot rank them on one score. This guide uses official sources only.
- DeepSeek V4 (April 24, 2026) ships as V4 Pro and V4 Flash, with a 1M context window, open weights on Hugging Face, and dual Thinking / Non-Thinking modes.
- DeepSeek's official release calls V4 Pro open-source state-of-the-art in agentic coding, but publishes charts rather than a single reproducible score.
- MiniMax M3 (June 1, 2026) is the first open-weight model to combine frontier coding, up to 1M context, and native multimodality; MiniMax claims 59.0% on SWE-Bench Pro.
- MiniMax M3's open weights began rolling out in mid-June 2026, and its benchmarks are run on MiniMax's own infrastructure.
- Kimi K2.7-Code (June 12, 2026) is a 1-trillion-parameter / 32B-active coding model built on Kimi K2.6, with a 256K context window and a Modified MIT license.
- Kimi reports K2.7-Code scores 62.0 on its own Kimi Code Bench v2 and cuts thinking-token use about 30% versus K2.6.
- The only benchmark two of the three share is MCP-Atlas (MiniMax 74.2, Kimi 76.0); DeepSeek does not report it.
Everyone wants one answer: the best open-source coding model. The honest answer is that the three models worth your time in mid-2026 cannot be lined up on a single chart, because each vendor reports a different benchmark.
This guide uses official sources only. I checked DeepSeek's release notes, MiniMax's announcement, and Moonshot's Kimi K2.7-Code model card directly. I did not use leaked benchmark slides, third-party leaderboards, or "I tested it" threads. Where a number is the vendor's own claim, I say so.
For the full open-model ranking across every use case, see our best open-source AI models guide. This article is only about coding.
Three Models, Three Different Benchmarks
This is why "best coding model" is the wrong question
Here is the problem in one paragraph. DeepSeek says V4 Pro is open-source state-of-the-art in agentic coding, but its release note shows charts, not a single number you can reproduce. MiniMax says M3 scores 59.0% on SWE-Bench Pro. Kimi says K2.7-Code scores 62.0 on Kimi Code Bench v2, a benchmark Moonshot runs itself.
Those are three different tests. A SWE-Bench Pro percentage and a Kimi Code Bench v2 percentage are not the same measurement, so "MiniMax 59 vs Kimi 62" means nothing. The only benchmark any two of them share is MCP-Atlas, a tool-orchestration test (MiniMax 74.2, Kimi 76.0), and that measures agentic tool use, not raw code quality.
What this means for you
Choose by fit, not by a single leaderboard number. The right question is "which one matches my context window, budget, license, and workflow," not "which one has the bigger percentage." The rest of this guide answers the first question.
Head-to-Head: The Official Numbers
Everything each vendor states, side by side
DeepSeek V4 vs MiniMax M3 vs Kimi K2.7-Code
| DeepSeek V4 | MiniMax M3 | Kimi K2.7-Code | |
|---|---|---|---|
| Released | April 24, 2026 | June 1, 2026 | June 12, 2026 |
| Open weights | Yes (Hugging Face) | Yes (rolled out mid-June) | Yes (Hugging Face) |
| License | Open-weight (check terms) | Open-weight (check terms) | Modified MIT |
| Context window | 1M | Up to 1M | 256K |
| Architecture | V4 Pro 1.6T / 49B active; V4 Flash 284B / 13B active (MoE) | Sparse MoE with MSA (params not disclosed) | 1T total / 32B active (MoE), built on K2.6 |
| Multimodal | Text, with Thinking / Non-Thinking modes | Native image and video input | Vision via MoonViT encoder |
| Coding benchmark the vendor reports | Claims SOTA agentic coding (no single number) | SWE-Bench Pro 59.0% (vendor-run) | Kimi Code Bench v2 62.0 (own benchmark) |
| API price (input / output per 1M) | Flash $0.14 / $0.28; Pro $0.435 / $0.87 | ~$0.30 / $1.20 (smaller-context tier) | $0.95 / $4.00 |
1. DeepSeek V4: Cheapest Long-Context Coding
The value pick, and the one with the most momentum
DeepSeek V4 Preview went live on April 24, 2026. DeepSeek's official release note lists two models, both with a 1M context window and both available as open weights on Hugging Face.
DeepSeek V4 Official Model Split
| Model | Parameters | Positioning |
|---|---|---|
| DeepSeek V4 Pro | 1.6T total / 49B active | Flagship V4 model for reasoning, world knowledge, and agentic coding |
| DeepSeek V4 Flash | 284B total / 13B active | Fast and economical V4 model |
Source: DeepSeek V4 Preview Release
On coding, DeepSeek's release note is confident but vague. It says V4 Pro shows "open-source state-of-the-art in agentic coding benchmarks" and "beats all current open models in Math, STEM, and Coding, rivaling top closed-source models." It backs this with benchmark charts rather than a single reproducible figure, so treat it as a strong claim, not a measured number.
Two things make it the value pick. Price first: V4 Flash is the cheapest serious coding model here, at $0.14 input and $0.28 output per million tokens. Then the ecosystem: DeepSeek says V4 plugs straight into Claude Code, OpenClaw, and OpenCode, so it drops into existing agent setups. For a deeper breakdown, see our DeepSeek V4 guide and the DeepSeek V4 vs Qwen comparison.
Best for
High-volume coding at the lowest cost, whole-repo work that needs a 1M context window, and teams that want open weights plus a published API in one model.
2. MiniMax M3: Highest Coding Claim, Newest Weights
Promising numbers you should verify yourself
MiniMax announced M3 on June 1, 2026, and calls it the first open-weight model to combine frontier coding, up to 1M-token context, and native multimodality in one model. It uses a new sparse attention architecture MiniMax calls MSA, which the company says cuts per-token compute at 1M context to roughly one-twentieth of its previous generation.
Not sure which AI model to use?
12 models · Personalized picks · 60 seconds
MiniMax M3 Official Claims
| Item | What MiniMax states |
|---|---|
| Coding (SWE-Bench Pro) | 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro on this test, per MiniMax) |
| Terminal-Bench 2.1 | 66.0% |
| MCP-Atlas (tool use) | 74.2% |
| Context | Up to 1M tokens via MiniMax Sparse Attention (MSA) |
| Multimodal | Native image and video input; can operate a desktop computer |
| Weights | Open weights released over roughly 10 days from the June 1 announcement |
Source: MiniMax M3 official announcement
Read the asterisk
MiniMax M3's benchmark numbers are the company's own, run on MiniMax infrastructure, and are not independently verified at the time of writing. The open weights only began shipping in mid-June, so real-world, self-hosted reports are still thin. The 59% SWE-Bench Pro figure is strong enough to test, not strong enough to crown.
If the SWE-Bench Pro number holds up under independent testing, M3 could be the most interesting open-weight coding model of the year, and the native multimodality is a genuine edge if your workflow feeds screenshots or video into the model. For now, put it on a branch and benchmark it on your own repository before you trust it in production.
3. Kimi K2.7-Code: Built for Coding Agents
The cleanest license and the strongest tool-use score
Moonshot released Kimi K2.7-Code on June 12, 2026, as a coding-focused model built on Kimi K2.6. Its official Hugging Face model card lists a 1-trillion-parameter Mixture-of-Experts design with 32B active parameters, a 256K context window, and a 400M-parameter MoonViT vision encoder, released under a Modified MIT license.
Kimi K2.7-Code Official Details
| Item | Official detail |
|---|---|
| Built on | Kimi K2.6, tuned for real-world long-horizon coding |
| Architecture | 1T total / 32B active Mixture-of-Experts |
| Context | 256K tokens |
| License | Modified MIT, with open weights on Hugging Face |
| Coding (own benchmark) | Kimi Code Bench v2 62.0, up from 50.9 on K2.6 |
| MCP-Atlas (tool use) | 76.0, the highest tool-use score here |
| Efficiency | Moonshot says it cuts thinking-token usage about 30% versus K2.6 |
Source: Moonshot Kimi K2.7-Code model card (Hugging Face)
K2.7-Code earns its spot for agent builders for two reasons. Its MCP-Atlas score of 76.0 is the highest tool-use number here, which matters when the model has to call tools, read a repo, and edit files in a loop. And the Modified MIT license is the cleanest of the three for commercial self-hosting. The API is OpenAI- and Anthropic-compatible, so it slots into existing tooling. For background on Moonshot's agent approach, see our Kimi agent swarm guide.
Want the raw numbers side by side?
Our live AI benchmark leaderboard tracks coding and agent scores with the official source behind every cell.
Open the benchmark leaderboardOfficial sources only, archived versions kept for history.
Which One Should You Use?
Pick by goal, not by a single percentage
Quick Decision
- 1Want the lowest API cost for high-volume coding? Start with DeepSeek V4 Flash at $0.14 / $0.28 per 1M tokens.
- 2Need the biggest context for whole-repo work? DeepSeek V4 and MiniMax M3 both list up to 1M tokens.
- 3Chasing the highest reported coding score? Test MiniMax M3, but treat its 59% SWE-Bench Pro as an unverified vendor claim.
- 4Building coding agents with heavy tool use? Kimi K2.7-Code is purpose-built for it and scores highest on MCP-Atlas (76.0).
- 5Need image or video input in your coding workflow? MiniMax M3 is natively multimodal; Kimi adds a vision encoder.
- 6Want the cleanest license to self-host commercially? Kimi K2.7-Code ships under a Modified MIT license.
Best Pick by Goal
| Your goal | Best pick | Why |
|---|---|---|
| Cheapest coding API | DeepSeek V4 Flash | $0.14 / $0.28 per 1M tokens with a 1M context window |
| Whole-repo / long context | DeepSeek V4 or MiniMax M3 | Both list up to 1M tokens in official docs |
| Highest reported coding score | MiniMax M3 | 59.0% SWE-Bench Pro, vendor-run and unverified |
| Coding agents and tool use | Kimi K2.7-Code | Built for agentic coding; 76.0 on MCP-Atlas |
| Cleanest self-host license | Kimi K2.7-Code | Modified MIT with open weights on Hugging Face |
| Multimodal coding input | MiniMax M3 | Native image and video input |
My practical pick
If I had to start one project today: DeepSeek V4 Flash for cheap, long-context coding; Kimi K2.7-Code when the job is an agent that uses tools and edits a repo; and MiniMax M3 on a test branch until independent benchmarks confirm the SWE-Bench Pro number. Do not pick a "winner" off a single leaderboard screenshot.
If you also want closed models in the mix, run your shortlist through the AI Model Picker, or compare prices in our AI coding tools pricing guide.
Read This Before You Trust Any Number
Four honest caveats
What the official data does not prove
All three benchmark sets are vendor-reported, and none are independently reproduced here. MiniMax M3's weights only began shipping in mid-June, so self-hosting reports are still early. The three models report different benchmarks, so an "X beats Y by N percent" claim is not possible from official data. And licenses differ: only Kimi states a Modified MIT license outright, so confirm DeepSeek's and MiniMax's terms before commercial use.
The practical takeaway is simple. Treat every number above as a starting hypothesis, then benchmark the two or three finalists on your own codebase. Your repository is the only leaderboard that matters.
Official Sources Used
- DeepSeek V4 Preview Release
- DeepSeek Models & Pricing
- MiniMax M3 official announcement
- Moonshot Kimi K2.7-Code model card (Hugging Face)
- Moonshot API platform
FAQ
What is the best open-source coding model in 2026?
There is no single winner, because the three leading open-weight coding models report different benchmarks. Based on official sources: DeepSeek V4 Flash is the cheapest with a 1M context window, MiniMax M3 reports the highest raw coding score (59% on SWE-Bench Pro, vendor-run), and Kimi K2.7-Code is purpose-built for coding agents and ships under a Modified MIT license.
Is MiniMax M3 better than DeepSeek V4 for coding?
You cannot prove it from official data. MiniMax reports 59% on SWE-Bench Pro, but that test ran on MiniMax's own infrastructure and is not independently verified. DeepSeek calls V4 Pro open-source state-of-the-art in agentic coding but publishes no single comparable number. They report different benchmarks, so a clean head-to-head is not possible yet.
Which open-source coding model is cheapest?
DeepSeek V4 Flash, at $0.14 per million input tokens and $0.28 per million output tokens on DeepSeek's official pricing page, with cache discounts and a 1M context window. MiniMax M3 lists roughly $0.30 input and $1.20 output for the smaller-context tier, and Kimi K2.7-Code lists $0.95 input and $4.00 output.
Can I self-host DeepSeek V4, MiniMax M3, and Kimi K2.7-Code?
Yes. All three publish open weights. DeepSeek V4 and Kimi K2.7-Code weights are on Hugging Face (Kimi under a Modified MIT license), and MiniMax M3 weights began rolling out in mid-June 2026. Check each provider's license before commercial deployment.
Which open coding model has the biggest context window?
DeepSeek V4 and MiniMax M3 both list up to a 1 million token context window in their official docs. Kimi K2.7-Code lists a 256K token context window on its model card.
Want official-source AI breakdowns like this?
Join the newsletter for plain-English model comparisons, pricing checks, and no-hype analysis built on official sources only.
Keep Reading
Stay ahead of the AI curve
We test new AI tools every week and share honest results. Join our newsletter.



