
Coding Agent Swarms, Part 5: Running the Fleet From Your Phone
The Last Mile Is the Operator The first four parts of this series built the substrate: foundation, fleet, multi-fleet …

Every platform has an origin story. Most of them are forgotten by the time the platform is mature, which is unfortunate — because the shape of what a platform is now is usually a direct response to a sequence of specific problems the team kept hitting. When you understand the sequence, you understand the architecture.
Calliope is a few years into its life, and the shape it has in mid-2026 — a three-pillar private-AI stack designed for enterprises that need to run AI inside their own cloud — was not the original idea. It is what the original idea grew into through five distinct phases. This is a short walk through those phases, because the pitch of “private AI for the enterprise” only makes sense if you can see how each layer of the stack got there.
The short version: the platform evolved by absorbing each layer that, in retrospect, had to be inside the customer’s perimeter to make the previous layer actually work.
Phase 1 Phase 2 Phase 3
───────────── ───────────── ─────────────
AI Workbench Free Desktop Multi-Agent
(cloud-hosted) Apps Released Runtimes Shipped
(JupyterHub (BYOK, (browser /
+ AI tools) signed builds) desktop / terminal
tiers, scuttlebot)
│ │ │
▼ ▼ ▼
Phase 4 Phase 5 (now)
──────────────── ──────────────────
BYOC Runtime Three-Pillar Stack
(Astrolift, (Workbench +
multi-cloud, Astrolift +
in YOUR cloud) Zentinelle.ai,
in YOUR cloud)
Each phase was a response to a real problem from the previous phase. None of them was planned in advance. All of them turned out to be necessary in hindsight. The trajectory only became visible once we had crossed three or four of them.
The earliest Calliope was a cloud-hosted AI workbench: JupyterLab with AI-assisted notebooks, an AI-aware IDE, an AI-aware chat surface, plus the early versions of what is now Chat Studio and DB Loadr. It was useful immediately for individual data scientists, ML engineers, and developers who wanted a single, coherent workbench experience for AI-powered work.
The model: customer logs into a hosted instance, uses the tools, brings their own API keys for whichever AI providers they prefer. BYOK from the start, because nobody wanted us in the middle of their inference bills.
The problem that emerged: customers in regulated industries could not use a hosted workbench at all. Not because of any specific failure on our part — but because their data, by policy, could not leave the perimeter. The workbench was good. Its location was wrong.
The first response was free desktop apps. Calliope AI IDE, Calliope AI Lab, Chat Studio, DB Loadr — each shipped as a downloadable, BYOK, signed (eventually), platform-native desktop application. macOS, Windows, Linux. No account required. No middleman billing. The same multi-provider model as the cloud workbench.
This worked for individuals and small teams. An engineer could install the IDE on their laptop, point it at OpenAI or Anthropic or local Ollama, and have a coherent AI-powered development environment without a SaaS subscription. The desktop apps are still free today; they are part of the workbench pillar.
The problem that emerged: desktop apps solve the individual case, but not the team case. A regulated organization needs more than “every employee installs the app on their laptop.” They need shared infrastructure, central audit, consistent identity, governed policies — things desktop apps cannot provide on their own.
We had built the workbench in two places. Neither of them was where the team could live. That gap pointed us toward the next phase.
While the workbench problem was being solved, a separate trend was reshaping how engineering teams worked: coding agents went from curiosity to default. Cursor, Claude Code, Codex CLI, and a dozen others started landing real productivity in real engineering workflows. Customers asked us a specific question: can you give us a way to run these agents on infrastructure we control, with coordination between agents, accessible from any device?
That became the multi-agent runtime layer. We shipped Calliope agent images in three tiers — terminal, browser, and full desktop — each with VNC-enabled access, each capable of running Claude / Codex / Gemini agents, each integrated with scuttlebot (our coordination hub) so a fleet of agents could collaborate and report through a single chat-style surface.
The agents ran inside JupyterHub spawns. The hub provided identity, routing, and isolation. The whole thing was reachable from any browser, including phones. The five-part agent-swarm series covers what this layer makes possible.
The problem that emerged: the agents had a runtime, but the apps the agents built did not. An agent could write a service in a JupyterHub spawn, but production deployment of that service still depended on whatever PaaS the customer was using — usually Vercel, Render, or Railway, none of which the customer’s data could legally touch. The runtime gap was real and it sat in the next layer down.
Astrolift was the response. A cloud-agnostic PaaS — the in-house Railway or Render alternative that runs inside the customer’s own cloud. Multi-cloud Kubernetes abstraction (AWS, GCP, Azure, vanilla Kubernetes for on-prem and air-gapped). Preview environments per pull request. GitOps delivery. Magic-link mobile approvals. Temporal-backed durable workflows. Deep multi-tenant identity from organization down to individual app.
With Astrolift in place, the picture changed. Now an engineer using the workbench (Phase 1–2) could prototype an AI app, an agent swarm (Phase 3) could iterate on it, and the production deployment (Phase 4) could land in the customer’s own cloud — no Vercel, no Render, no consumer-cloud detour. The data stayed put. The audit trail stayed put. The cost lived where the customer could see it.
The problem that emerged: the platform now had four places where AI was happening (workbench, agents, apps in runtime), and there was no single layer governing what any of them did. Compliance teams could not answer “what is happening across all this AI?” without a forensic project. Security teams could not enforce policies consistently. The runtime made deployment possible; it did not make AI activity governable.
The fifth pillar — and the current shape of the platform — is governance and observability as a first-class layer. Zentinelle , together with the open-source zentinelle-sdk , sits between every agent and every model provider, mediating outbound traffic with inline policy enforcement, content scanning, audit chaining, and real-time observability.
The integration story matters: Zentinelle is not bolted on next to the runtime. It shares the runtime’s identity model. When Astrolift spawns a workload, Zentinelle’s policy gateway is injected by construction. When the workbench makes a model call, that call passes through the same policy layer. Three pillars, one identity model, one audit chain, one cost dashboard.
With Zentinelle in place, the picture is complete. A new compliance framework (EU AI Act, NIST AI RMF, an updated SOC 2 control) maps to evaluators in one place and applies across the entire stack. A regulator’s question — “show me every model call made by your agents in this region in the last 90 days” — is a query that returns in seconds. The customer’s security team operates the same dashboards their CISO already understands.
This is the three-pillar architecture we have been writing about across the rest of this blog: workbench, runtime, governance. Each pillar inside the customer’s own cloud. Each pillar sharing identity, audit, and policy. Each pillar designed to be operable by a mid-market security team, not a hyperscaler’s platform engineering org.
Looking back over five phases, the through-line is one statement, repeated:
Every layer that touches AI in an enterprise has to be inside the enterprise’s perimeter, or the enterprise cannot use AI at scale.
That statement is not the original premise. It is what the platform learned by trying not to be that. The cloud workbench worked for some customers and not for others; the gap was perimeter. The desktop apps worked for individuals and not for teams; the gap was perimeter. The multi-agent runtime worked for the agents but not for the apps they produced; the gap was perimeter. The runtime worked for deployment but not for governance; the gap was perimeter.
Five phases later, every layer that touches AI in the customer’s organization is in the customer’s perimeter. The model providers are still on the other end of a wire — and that is fine, because Zentinelle decides what data crosses that wire and records every crossing.
┌──────────────────────────────────────────────────────────────┐
│ │
│ CALLIOPE PRIVATE AI │
│ (inside your cloud, BYOC) │
│ │
│ ┌─────────────────┐ │
│ │ Workbench │ IDE, Lab, Chat Studio, DB Loadr │
│ │ (Pillar 1) │ desktop apps + cloud-hosted │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Runtime │ Astrolift │
│ │ (Pillar 2) │ multi-cloud BYOC PaaS │
│ └────────┬────────┘ (astrolift.ai) │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Governance │ Zentinelle + SDK │
│ │ (Pillar 3) │ policy, audit, observability │
│ └─────────────────┘ (zentinelle.ai) │
│ │
│ + your VPC, IAM, KMS, SIEM, identity provider │
│ + subscription / forward-deployed engineering / │
│ implementation services │
│ │
└──────────────────────────────────────────────────────────────┘
This is what we offer in mid-2026. It did not look like this two years ago. It probably will not look like exactly this two years from now — some new phase will reveal a gap we have not yet seen, and another layer will land. But the shape today is the result of customers in regulated industries telling us, over and over, what their perimeter actually demanded — and us moving the platform inside that perimeter until we ran out of layers to move.
If your organization is in the middle — too regulated for consumer AI, too constrained to build your own platform from scratch — the architecture you have been waiting for is shipped. Not as a slide. Not as a roadmap. As a stack you can deploy in your cloud, with support from people who have been deploying it.
The three pillars are the destination. The five phases are how we got there. Every story this blog tells — from coding agent swarms to vibe coding safely to the tool sprawl problem to the case for being private AI for the middle of the market — is a piece of the same picture, viewed from a different angle.
The platform evolves because the customers tell it what is missing. The next phase is whichever one of them surfaces something we have not seen yet.
This is the narrative piece. The next 13 posts in this series go deep on specific problems private AI solves for mid-market organizations: tool sprawl, bastion replacement, data residency, sovereign AI in Europe, secure administration, and AI inside near-airgapped environments — with industry-specific variants for healthcare, finance, defense, pharma, public sector, and critical infrastructure.

The Last Mile Is the Operator The first four parts of this series built the substrate: foundation, fleet, multi-fleet …

When One Model Isn’t the Right Model By now in this series we have built a foundation (part 1), scaled it to a …