Coding Agent Swarms in Calliope, Part 1: The Foundation

Coding Agent Swarms in Calliope, Part 1: The Foundation

May 21, 2026 - 9 Min read

Build It Once, Then Multiply

There is a particular flavor of “AI productivity” that has not landed yet for most engineering teams. Not “the agent writes a function for me.” Not “the agent runs a single task overnight.” The thing we mean: a fleet of coding agents, all working in parallel, all reachable from your phone, all reporting into one place — running inside your own infrastructure.

That is a swarm. It is a different category of work than running one agent at a time. And the gap between “I tried Claude Code last week” and “my team runs a swarm against our backlog every night” is mostly a single architectural decision, repeated:

Decouple the place an agent runs from the place a human watches it.

Once you have that decoupling, scale becomes a knob. Spawning ten agents is no harder than spawning one. Watching ten agents is no harder than watching one — if the watching surface is shared. Doing it from your phone, in line at a coffee shop, with no app to install, is no harder than doing it from your laptop.

This is part 1 of a five-part series. We start with the topology of a single agent, the substrate it runs on, and the surface a human uses to see and steer it. The next four parts add agents, add problems, add models, and add the phone-first operator workflow on top.

The Topology, At Its Simplest

A single coding agent in Calliope is three things in three places, glued together by a known set of network paths.

                    ┌────────────────────────────────────┐
                    │           YOUR CLOUD               │
                    │                                    │
   ┌──────────┐     │   ┌────────────────────────────┐   │
   │  Phone   │     │   │       JupyterHub Hub       │   │
   │  /       │ ────┼──▶│   (auth, routing, spawn)   │   │
   │  Laptop  │     │   └─────────────┬──────────────┘   │
   │ Browser  │     │                 │                  │
   └──────────┘     │     ┌───────────┴───────────┐      │
                    │     ▼                       ▼      │
                    │   ┌──────────────┐   ┌──────────┐  │
                    │   │ Agent Spawn  │   │Scuttlebot│  │
                    │   │ (browser/    │◀─▶│  Spawn   │  │
                    │   │  desktop/    │   │ (IRC +   │  │
                    │   │  terminal)   │   │ Web UI)  │  │
                    │   └──────────────┘   └──────────┘  │
                    │                                    │
                    └────────────────────────────────────┘

The Hub is the front door. Every browser request — laptop, phone, tablet, whatever — lands at the hub. The hub authenticates you, then routes you to a spawn (an isolated, per-user container). The hub is what makes the entire system reachable from one URL with one login, regardless of which agent you are looking at or which device you are on.

The Agent Spawn is where the actual coding work happens. It is a container with a working desktop or terminal inside, running an AI runtime (Claude, Codex, or Gemini), and accessible via your browser as if you were sitting in front of a remote machine. No SSH. No app install. You see the agent’s screen the way the agent sees its screen.

The Scuttlebot Spawn is the swarm’s nervous system. It is an IRC server (Ergo, packaged as a Calliope-native spawn) plus a web UI. Each agent in the swarm has a relay process that pipes its activity into a scuttlebot channel: what it is doing, what it asked the model, what tools it used, what errors it hit. You — the human — read those channels from any browser. The agents read them too, which is how they end up coordinating.

At one agent, scuttlebot is overkill. We still set it up at this stage because the moment you add the second agent, everything gets harder if scuttlebot is not already there. Better to wire it once at the start and forget it.

The Two Surfaces You Use

You interact with this topology through exactly two surfaces:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   ┌─────────────────────┐         ┌──────────────────────────┐  │
│   │   Agent Surface     │         │   Coordination Surface   │  │
│   │   (per agent)       │         │   (one for the swarm)    │  │
│   │                     │         │                          │  │
│   │  ▸ noVNC desktop    │         │  ▸ Scuttlebot web UI     │  │
│   │  ▸ Browser-based    │         │  ▸ Chat-style channels   │  │
│   │  ▸ Mouse + keyboard │         │  ▸ One per agent / task  │  │
│   │  ▸ See what it sees │         │  ▸ Searchable history    │  │
│   │                     │         │                          │  │
│   └─────────────────────┘         └──────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Agent Surface is the agent’s desktop, served to your browser via noVNC (a VNC client that runs in JavaScript, no plugin, no install). You can drive the agent’s screen with your mouse and keyboard, watch it think, intervene if you want.

The Coordination Surface is the scuttlebot web UI — a chat-style view of every channel the swarm is using. One agent typically gets one channel. Status reports, decisions, errors, hand-offs, all flow through these channels. Three months from now you can search them to find out exactly what the swarm did and why.

The reason this is the right split: the Agent Surface is expensive to keep open (a full streamed desktop) but only relevant when you are actively steering the agent. The Coordination Surface is cheap to keep open (a small chat stream) but is what you watch ambiently. When you scale to ten agents, you do not open ten Agent Surfaces — you keep the Coordination Surface open and dive into individual Agent Surfaces only when something interesting happens.

This split is the entire reason swarms are tractable. Without it, your operator UX degrades linearly with fleet size. With it, the operator UX is constant.

Why It Works From a Phone

The whole topology is browser-native. Three consequences:

No install. No app store, no enterprise MDM dance, no certificate provisioning. Open Safari or Chrome on the phone, go to the hub URL, log in, you are operating.
Mobile-responsive surfaces. Scuttlebot’s web UI is built for chat — chat is the form-factor mobile is best at. noVNC supports touch — pinch, scroll, tap. It is not as comfortable as a laptop, but for “check on the swarm, kick off another task, approve something” it is more than enough.
PWA-installable. Most modern browsers can “install” a web app to your home screen. Once you do this for the hub, opening the swarm console feels like opening a native app. No App Store. No corporate review. Your hub admin chose the URL; you trust it; the phone treats it like an app.

The capital cost of this is exactly one decision: we put our coding agents in our cloud, behind our hub. Every other property — phone access, no install, single sign-on, audit trails — follows from that decision.

What “Bootstrap to Scuttlebot” Means

When a fresh agent container starts, it runs a small bootstrap that connects its relay process to the scuttlebot it has been told about. The relay is a sidecar — claude-relay, codex-relay, or gemini-relay, depending on which model the agent runs — and its only job is to mirror the agent’s activity into an IRC channel.

This bootstrap is what turns a running agent into a member of a swarm. Before bootstrap, the agent runs in isolation, talking to a model, doing its task, with the human’s only window being the noVNC desktop. After bootstrap, the agent’s activity is visible to scuttlebot, visible to other agents, and visible to you on every device that opens the scuttlebot web UI.

You do this once. You configure the bootstrap as part of the agent spawn template. From then on, every agent you spawn — one at a time, ten at a time, a hundred at a time — joins the swarm automatically. Day one of operating a fleet has a setup cost. Day two and beyond have none.

The Two-Pane Operator View

Once the foundation is wired, a typical operator session looks like this:

 Browser tab 1                       Browser tab 2 (or split)
 ┌──────────────────────────┐        ┌──────────────────────────┐
 │  Scuttlebot Web UI       │        │  noVNC: agent #3         │
 │                          │        │                          │
 │  #agent-1  ▼ idle        │        │  ┌─────────┐ ┌────────┐  │
 │  #agent-2  ▶ running     │        │  │ Editor  │ │  Term  │  │
 │  #agent-3  ▶ running     │        │  │ ...     │ │ $ ...  │  │
 │  #agent-4  ◼ blocked     │        │  │         │ │        │  │
 │                          │        │  └─────────┘ └────────┘  │
 │  > agent-3 ran tests     │        │                          │
 │  > agent-3 24/30 pass    │        │  (mouse/keyboard live)   │
 │  > agent-3 retrying...   │        │                          │
 │                          │        │                          │
 └──────────────────────────┘        └──────────────────────────┘
   ambient view                        deep dive when needed

You live in the left pane. The right pane is the click-through when you want to see what an agent is actually doing. On a laptop these are side by side. On a phone they are different tabs, and you swipe between them.

The left pane is the entire operator dashboard. It scales to N agents trivially — the more agents, the more channels, but the surface stays the same shape.

What This Foundation Gives You

When you have this in place:

One human, one login, one URL → access to N agents from any device.
A persistent record of everything every agent ever did, in chronological chat form.
A no-install experience that works from a phone in a coffee shop or a laptop in a regulated environment.
An identity model where every agent is scoped to your team / project, with auth flowing through the hub.
Zero data leaves your cloud unless you point an agent at an external model — and even then, the agent’s output, logs, and context stay inside your perimeter.

What it does not yet give you — what the rest of this series builds toward — is multiple agents working together on the same problem, multiple agents working on different problems with one operator view, mixed-model swarms where Claude + Codex + Gemini collaborate, and the operator workflow tuned specifically for running the swarm from your phone.

Part 2 starts from this foundation and adds the first multiplier: a fleet of homogeneous agents pointed at a single goal.

Where to Go Next

The pieces you need to build this:

calliope-agents — the agent container images (browser, desktop, terminal tiers), each VNC-enabled and runtime-agnostic (Claude, Codex, or Gemini). The bootstrap and relay binaries live here.
calliope-scuttlebot — the swarm coordination hub: IRC server, web UI, JupyterHub-integrated spawn.
junohub — the JupyterHub deployment that hosts spawns and routes traffic. The hub-and-spawn model is what makes the topology coherent across devices.
docs.calliope.ai — current docs, including spawn templates, bootstrap configuration, and the recommended hub deployment per cloud.

Read the repos for the implementation. The rest of this series is about what to do with them once they are running.

Next in this series — Part 2: From One to a Fleet. Pointing N agents at a single problem, coordination patterns, and how to keep the operator surface boring as the fleet grows.