
Coding Agent Swarms, Part 5: Running the Fleet From Your Phone
The Last Mile Is the Operator The first four parts of this series built the substrate: foundation, fleet, multi-fleet …

Two years ago, the narrative was simple: proprietary models were better, open models were a curiosity, and if you wanted state-of-the-art performance you paid for an API key. OpenAI, Anthropic, and Google held the frontier. Open-source was for tinkerers.
That narrative is dead.
In 2026, open-weight models routinely match or beat proprietary offerings on specific tasks. DeepSeek V3.2 outperforms GPT-5 on reasoning benchmarks. Google’s Gemma 3 27B beats Gemini 1.5 Pro on several evaluations despite being fully self-hostable. Alibaba’s Qwen 3.5 competes with frontier proprietary models on coding and long-context reasoning. These aren’t cherry-picked results on toy benchmarks. These are production-relevant capabilities that organizations are deploying today.
The bottleneck has shifted. Model access is no longer the constraint. Infrastructure, governance, and the ability to swap models without re-architecting your stack — that’s what separates organizations that benefit from open AI from those still stuck evaluating options.
Here’s a practical look at the five model families that matter most right now, and when to use each one.
DeepSeek came out of nowhere and rewrote the conversation. V3.2 is a 685-billion parameter model released under the MIT license — the most permissive open-source license in common use. No restrictions on commercial deployment. No usage clauses to lawyer over.
The Speciale variant is the headline: it beats GPT-5 on reasoning benchmarks. Not “approaches.” Not “competitive with.” Beats. For organizations that need strong reasoning — code generation, mathematical problem-solving, multi-step analysis — DeepSeek V3.2 is a legitimate alternative to paying per-token for proprietary APIs.
The trade-off is size. 685B parameters is not something you run on a MacBook. You need serious GPU infrastructure — multiple A100s or H100s for full-precision inference, or aggressive quantization to fit on smaller setups. But for organizations already running GPU clusters for training or other workloads, the marginal cost of hosting DeepSeek is dramatically lower than perpetual API fees at scale.
Best for: Reasoning-heavy workloads, code generation, mathematical and scientific analysis, any scenario where MIT licensing matters for legal or compliance reasons.
Meta’s Llama family remains the gravitational center of the open-source model ecosystem. Llama 4 moved to a Mixture-of-Experts architecture, ranging from 16x17B to 128x17B parameter configurations. The MoE design means only a fraction of parameters activate per token, making inference significantly cheaper per query than a dense model of equivalent quality.
But the real story with Llama isn’t any single model — it’s the ecosystem. Thousands of fine-tuned variants exist. If you need a Llama model tuned for medical text, legal analysis, customer support, code review, or any other niche, someone has probably built one. The community tooling is the most mature of any open model family: quantized versions for every hardware tier, LoRA adapters for every domain, and deployment templates for every major inference framework.
The licensing is Meta’s custom Llama license — not true open source by OSI standards, but permissive enough for most commercial use cases under 700 million monthly active users (which covers roughly everyone who isn’t Meta-sized).
Best for: Teams that need ecosystem breadth, fine-tuned domain variants, mature deployment tooling, and MoE efficiency for cost-sensitive inference at scale.
Google’s Gemma 3 family is the strongest argument that you don’t always need hundreds of billions of parameters. The lineup spans from 270M to 27B parameters, and the flagship 27B model punches absurdly above its weight: 42.4 on GPQA Diamond, 69.0 on MATH, beating Google’s own Gemini 1.5 Pro on several benchmarks.
Read that again. A 27B self-hostable model outperforming a proprietary frontier model from the same company on standardized evaluations.
Gemma 3 is also natively multimodal — vision and text — which matters for teams building applications that process images, documents, or screenshots alongside text. You don’t need a separate vision pipeline.
The practical advantage of the Gemma family is accessibility. The 27B model runs comfortably on a single high-end GPU. The smaller variants can run on consumer hardware or even edge devices. For teams that need capable AI on constrained infrastructure — on-premise deployments, air-gapped environments, edge computing — Gemma is the first model family to make that genuinely practical without painful quality compromises.
Best for: Resource-constrained deployments, edge and on-premise scenarios, multimodal applications (vision + text), teams that need strong performance per parameter.
Mistral has always punched above its weight, and Large 3 continues the tradition. It’s a MoE architecture — their first since Mixtral — with 41B active parameters out of 675B total, released under Apache 2.0. That’s a real open-source license with no asterisks.
Where Mistral Large 3 distinguishes itself is multilingual performance. If your organization operates across languages — European markets, global customer support, multilingual document processing — this is the open model to benchmark against. It leads the pack on cross-lingual tasks and handles code-switching (mixing languages within a conversation) better than any other open alternative.
The Apache 2.0 license also makes it the safest choice from a legal perspective for organizations with cautious legal teams. No custom license clauses, no usage restrictions, no ambiguity.
Best for: Multilingual workloads, organizations with strict open-source licensing requirements, European language processing, and teams that need MoE efficiency with strong general capabilities.
Alibaba’s Qwen family doesn’t get the attention it deserves in Western markets, but the numbers don’t lie. Qwen 3.5 delivers strong reasoning and coding performance, and Qwen3 Max is competitive with frontier proprietary models on standard benchmarks.
The standout capability is long-context processing. Qwen 3.5 handles 2-hour video analysis — that’s not a typo. For organizations working with long documents, video transcripts, meeting recordings, or any workflow involving extended context, Qwen’s context window and long-context performance are best-in-class among open models.
The trade-off is ecosystem maturity. Qwen doesn’t have the community breadth of Llama or the deployment tooling depth of Mistral. But the raw model quality is there, and the gap is closing fast.
Best for: Long-context applications, video and document analysis, coding tasks, and teams willing to trade ecosystem maturity for raw capability.
Knowing the models exist isn’t enough. The decision that matters is whether to self-host or keep paying for API access.
Self-hosting makes economic sense when at least two of these conditions are true:
Self-hosting does not make sense if you’re a small team with sporadic usage, you don’t have anyone who can manage GPU infrastructure, or you need the absolute frontier model and nothing else will do.
Here’s what most organizations get wrong: they pick a model and build their entire stack around it. Then six months later, a better model drops and they’re stuck re-architecting.
The open-source landscape moves fast. DeepSeek V3.2 didn’t exist a year ago. Gemma 3 changed the small-model calculus overnight. Qwen went from afterthought to contender in one release cycle. Building for a single model is building for obsolescence.
The winning architecture is model-agnostic. Your application layer shouldn’t know or care whether the underlying model is DeepSeek, Llama, Gemma, or a proprietary API. You should be able to swap models per task, per user, per cost threshold — without touching application code.
This is the approach we took with Calliope , supporting 21+ LLM providers with bring-your-own-model capability. Not because we predicted which model would win — nobody can — but because the answer to “which model is best?” changes every few months, and your platform shouldn’t break when it does.
The open-source AI landscape in 2026 is not a consolation prize. It’s the main event for a growing number of organizations.
The practical playbook:
The gap between open and proprietary has closed. For many workloads, it’s disappeared entirely. The organizations that recognize this and build accordingly will spend less, move faster, and own their AI stack. The ones still defaulting to proprietary APIs for everything are leaving money and control on the table.

The Last Mile Is the Operator The first four parts of this series built the substrate: foundation, fleet, multi-fleet …

A Short Story About Why the Stack Has the Shape It Does Every platform has an origin story. Most of them are forgotten …