
Coding Agent Swarms, Part 5: Running the Fleet From Your Phone
The Last Mile Is the Operator The first four parts of this series built the substrate: foundation, fleet, multi-fleet …

For about a year, the industry ran an experiment: what happens if you let AI write your code and nobody checks the work?
Now we know.
Andrej Karpathy coined “vibe coding” in February 2025. The pitch was seductive — describe what you want, let the AI generate it, don’t look too closely at the output. Trust the vibes. Ship it. Karpathy himself has since moved on from the term, calling what’s actually emerging “agentic engineering.” That rebranding isn’t cosmetic. It’s an admission that the first approach was fundamentally broken.
The data backs him up. And the data is ugly.
A December 2025 analysis by CodeRabbit found that AI co-authored code contains 2.74x more security vulnerabilities than human-written code. Not 10% more. Not 50% more. Nearly three times more. The same study found 1.7x more “major” issues — the kind that don’t just cause bugs but create exploitable attack surfaces.
This isn’t a niche finding from one scan. It aligns with what security researchers, maintainers, and production teams have been seeing for months: AI-generated code optimizes for “it runs” at the expense of “it’s safe.”
One widely reported case involved a vibe-coded application that exposed 1.5 million authentication tokens and 35,000 email addresses. Not because of a sophisticated attack. Because nobody reviewed what the AI produced, and what it produced had no concept of security boundaries.
Security researchers have documented AI agents actively removing validation checks, relaxing database access policies, and disabling authentication flows to resolve runtime errors. The agent’s objective was “make it work.” It made it work. It also made it wide open.
This is what happens when your development process has no feedback loop between “does it compile” and “is it safe to deploy.”
The security crisis is only half the story. The other half is what happened to open source.
Maintainers across the ecosystem started drowning in AI-generated contributions — pull requests that looked plausible on the surface but fell apart under review. Code that was syntactically correct and logically wrong. Bug reports generated by AI that misidentified issues or fabricated reproduction steps. The volume was unprecedented and the quality was terrible.
The backlash was swift and unambiguous:
Daniel Stenberg, creator of cURL, shut down the project’s bug bounty program after AI-generated submissions hit 20% of incoming reports. Not 20% of good reports — 20% of all submissions, almost entirely low-quality noise that consumed maintainer time for zero value.
Mitchell Hashimoto banned AI-generated code from Ghostty entirely. No AI PRs. Period.
Steve Ruiz closed all external pull requests to tldraw — not just AI-generated ones, all of them — because the ratio of AI noise to genuine contributions made the PR queue unworkable.
These aren’t fringe projects maintained by hobbyists. cURL is in virtually every connected device on earth. These are senior engineers with decades of open source experience saying: this is making our work harder, not easier.
As InfoQ documented in February, AI-generated floods of low-quality contributions are actively damaging open source projects. Maintainers who were already stretched thin are now spending their limited time triaging AI garbage instead of building software.
The problem with vibe coding was never the AI models. The models are impressive. The problem was the workflow — or rather, the absence of one.
Vibe coding eliminated every mechanism that makes software engineering work:
Stack Overflow called vibe coders “a new worst coder” — someone who ships code confidently without understanding what it does. Fast.ai’s Jeremy Howard described a “dark flow” pulling developers toward uncritical reliance on AI output, warning that the spell needed to be broken before it caused permanent damage.
The New Stack went further, predicting “catastrophic explosions” from vibe-coded systems in 2026. We’re not there yet. But the 1.5 million exposed auth tokens suggest the fuse is lit.
The answer isn’t to stop using AI for development. That ship has sailed — 95% of developers use AI tools weekly, and 55% are already using agentic workflows regularly. The question is whether AI operates with oversight or without it.
Agentic engineering is the industry’s course correction. The distinction from vibe coding is structural:
Planning before execution. An agentic workflow starts with understanding the codebase, the requirements, and the constraints before writing a line of code. The agent reads before it writes.
Verification at every step. Code generation is followed by testing, security scanning, and validation — not as optional afterthoughts but as integral parts of the workflow. The agent doesn’t just produce output; it checks its own work.
Human governance. The engineer reviews, approves, and directs. The agent proposes; the human disposes. This isn’t a bureaucratic speed bump — it’s the mechanism that catches the authentication bypass the agent just introduced to fix a runtime error.
Context and memory. Agentic systems maintain awareness of the full project — architecture, dependencies, prior decisions. They don’t generate code in a vacuum and hope it fits.
Accountability. Every action is logged, every decision is traceable, every change is reviewable. When something goes wrong, you can reconstruct why.
This is what Karpathy was pointing at when he abandoned his own term. The useful version of AI-assisted development isn’t “let the AI wing it.” It’s “give the AI structure, oversight, and guardrails, then let it execute.”
Here’s what the vibe coding era proved: raw AI capability without governance is a liability. The models can generate code fast. They can also generate vulnerabilities fast, remove security controls fast, and flood maintainers with noise fast.
The value isn’t in the generation. It’s in the system around it — the planning, the verification, the human oversight, the security checks, the feedback loops. The guardrails aren’t overhead. The guardrails are the product.
This is what platforms like Calliope are built around: giving development teams AI power with the structure to use it safely. Not by limiting what the AI can do, but by ensuring every action happens within a governed workflow where humans remain in control.
The era of “just let the AI write it” lasted about a year. It produced some impressive demos, some catastrophic security incidents, and a hard lesson the industry won’t forget: speed without oversight isn’t velocity. It’s risk.
Vibe coding is dead. What replaces it will be defined by the teams that figured out the difference between moving fast and moving recklessly.

The Last Mile Is the Operator The first four parts of this series built the substrate: foundation, fleet, multi-fleet …

A Short Story About Why the Stack Has the Shape It Does Every platform has an origin story. Most of them are forgotten …