Coding Agent Swarms, Part 5: Running the Fleet From Your Phone

Coding Agent Swarms, Part 5: Running the Fleet From Your Phone

Jun 04, 2026 - 9 Min read

The Last Mile Is the Operator

The first four parts of this series built the substrate: foundation, fleet, multi-fleet pane, mixed-runtime council. All of it works on a laptop. All of it also works on a phone — and the phone is the place this topology earns its keep.

Most “AI coding agent” pitches assume you sit at your desk. You do not. You sit on a train, in a café, at a kid’s soccer game, in a waiting room, between meetings. The agents do not care. They keep grinding through the backlog while you are not at your desk. The question is whether you can see them, steer them, and approve their work — without opening a laptop.

This part is about the operator workflow that makes the answer “yes.”

What Phone-First Actually Means

Phone-first does not mean “the laptop UI works on a small screen.” Most do not. Phone-first means: the topology is designed so that the operator’s most common actions are achievable from a phone, with no install, on a network that may not be great.

The breakdown:

   ┌──────────────────────────────────────────────────────┐
   │  Common operator actions, by frequency               │
   │                                                      │
   │  ──────────────────────────────────────────────────  │
   │                                                      │
   │  Glance at fleet status              ████████████    │
   │  Read a single fleet's room          ██████████      │
   │  Approve a pending PR                ████████        │
   │  Acknowledge an alert                ██████          │
   │  Read an individual agent's channel  █████           │
   │  Send a steering message to a fleet  ████            │
   │  Watch an agent's desktop live       ██              │
   │  Type code or commands               ▌               │
   │                                                      │
   └──────────────────────────────────────────────────────┘

Top six of those eight actions are pure reading, light tapping, and short-message-sending. All of them are good fits for a phone. The bottom two — watching a live desktop and typing code — are uncomfortable on a phone, and you should escape to a laptop when they come up.

Phone-first is the architecture that lets you stay on the phone for the first six, and only escalate to the laptop for the last two.

The Surfaces on a Phone

   ┌─────────────────────────┐
   │                         │
   │   PANE (PWA)            │
   │   ─────────────────     │
   │   Fleet 1  ▶ 60%        │
   │   Fleet 2  ◼ blocked    │
   │   Fleet 3  ▶ 35%        │
   │   Fleet 4  ✓ done       │
   │                         │
   │   ────────────────────  │
   │   ⚡ fleet2 blocked      │
   │   ⚡ fleet1 merged PR    │
   │   ⚡ council waiting     │
   │                         │
   │   [▶ open pane]         │
   │   [▶ open chat]         │
   │   [▶ open agent screen] │
   │                         │
   └─────────────────────────┘

   Three swipe-tabs, one phone:
   pane ▸ chat ▸ screen

Three browser tabs (or three PWA screens) form the phone operator surface:

Pane — fleet status, the alert stream, cost dashboard. The home screen. You start every session here.
Chat — the scuttlebot web UI. Channel list, message stream, ability to send messages into a room. Chat is what mobile is best at; everything important the swarm needs from you can be done by tapping a channel and typing a short message.
Screen — noVNC, when you really do need to see an agent’s desktop. Touch-driven, scrollable, zoomable. You use this last and least.

The three tabs share session state through the hub. One login on the phone covers all three. You do not authenticate three times.

PWA: The No-Install App

Every modern browser supports installing a web app to the home screen as a Progressive Web App. The hub URL, installed as a PWA, looks and feels like a native app:

App icon on the home screen.
Opens full-screen, no browser chrome.
Push notifications (most browsers).
Service worker for offline-tolerant cache.

The whole experience requires zero involvement from the app store, zero involvement from corporate device management, zero involvement from a developer-tools team. The user opens the URL once on their phone, taps “Add to Home Screen,” and is done.

   ┌───────────────────────────────────────────────────────┐
   │                                                       │
   │   Install Flow                                        │
   │   ──────────────────────────────────────              │
   │                                                       │
   │   Phone browser  →  hub.your-company.ai               │
   │                                                       │
   │   Browser menu   →  "Add to Home Screen"              │
   │                                                       │
   │   Home screen    →  Calliope icon                     │
   │                                                       │
   │   Tap            →  PWA opens full screen,            │
   │                     auto-signs in via session,        │
   │                     pane appears.                     │
   │                                                       │
   │   Total time: ~30 seconds. No app store.              │
   │                                                       │
   └───────────────────────────────────────────────────────┘

This is the property that makes phone access actually viable in regulated environments. App stores have approval cycles, MDM policies, security reviews, distribution constraints. A PWA on a URL the company already owns has none of those. The platform team approves the URL once; the user uses the phone they already have.

The Three Network Modes

A phone operator runs into three distinct network regimes during a typical day. The architecture has to absorb all three.

   ┌──────────────────────────────────────────────────────┐
   │                                                      │
   │  Mode A — Good Network (office Wi-Fi, strong LTE/5G) │
   │  ──────────────────────────────────────────────────  │
   │  Everything works as on a laptop.                    │
   │  Pane updates live. Chat is responsive.              │
   │  noVNC streams a desktop comfortably.                │
   │                                                      │
   │  Mode B — Patchy Network (coffee shop, transit)      │
   │  ──────────────────────────────────────────────────  │
   │  Pane updates lag 5–30s. Chat works fine             │
   │  (small messages). noVNC stutters; avoid.            │
   │  Approvals via chat or pane work cleanly.            │
   │                                                      │
   │  Mode C — Bad Network (poor signal, airplane)        │
   │  ──────────────────────────────────────────────────  │
   │  Pane shows last cached state. Chat queues outgoing  │
   │  messages until reconnect. Approvals queue.          │
   │  Do not attempt agent screen access.                 │
   │                                                      │
   └──────────────────────────────────────────────────────┘

The discipline is to know which mode you are in and pick the right surface accordingly. In Mode A, the laptop and the phone are equivalent. In Mode B, drop the screen tab and stay in pane + chat. In Mode C, accept that you are doing async work — read what you can, queue what you must, escalate to laptop when home.

The architecture is designed so that none of these modes is “broken.” You always have some useful operator surface. The fleet keeps working regardless of which mode you are in.

AGTerm vs noVNC: Picking the Right Thin Client

Two thin-client options exist for the agent surface, and the choice matters more on a phone than on a laptop.

   ┌────────────────────────────────────────────────────────┐
   │                                                        │
   │  noVNC (browser VNC)                                   │
   │  ──────────────────────                                │
   │  • Streams a full graphical desktop                    │
   │  • High bandwidth (~Mbps under activity)               │
   │  • Touch-supported but mouse-native UX                 │
   │  • Best when: you actually need to see GUI apps        │
   │    (browser-tier agents driving Chromium)              │
   │                                                        │
   │  AGTerm (web terminal)                                 │
   │  ──────────────────────                                │
   │  • Streams only terminal output                        │
   │  • Low bandwidth (~Kbps)                               │
   │  • Touch-friendly UI built-in                          │
   │  • Best when: you only need a shell or AI CLI          │
   │    (terminal-tier agents with Claude/Codex/Gemini CLI) │
   │                                                        │
   └────────────────────────────────────────────────────────┘

A phone-heavy operator stack defaults to AGTerm for the agent surface. It works on Mode B networks, it is comfortable on a small screen, it survives a brief connection blip without disconnecting. You drop into noVNC only when an agent is doing GUI work — a browser-tier agent driving Chromium, for instance — and you actually need to see the pixels.

If you can run your fleet on terminal-tier agents most of the time, the phone experience becomes substantially better. That is itself a design decision worth making early.

Approvals From a Phone

The single highest-leverage phone action is approving things — a council’s diff, a runtime promotion, a policy override. The pattern that works:

   ┌──────────────────────────────────────────────────────┐
   │                                                      │
   │  Push notification → tap → magic-link approval card  │
   │                                                      │
   │  ┌──────────────────────────────────────┐            │
   │  │  PR #427 — fix auth-layer rotation   │            │
   │  │  Council: refactor-auth-layer        │            │
   │  │  Reviewer (Gemini): approved         │            │
   │  │  Policy simulator: 0 changes         │            │
   │  │  Cost so far: $3.20                  │            │
   │  │                                      │            │
   │  │     [✓ approve]    [✗ reject]        │            │
   │  └──────────────────────────────────────┘            │
   │                                                      │
   │  No portal login. No SSO redirect.                   │
   │  Biometric confirm if configured.                    │
   │                                                      │
   └──────────────────────────────────────────────────────┘

The phone receives a push notification when an approval is waiting. Tap the notification. A magic-link card opens. You see exactly what is being approved — the PR, the council that produced it, the reviewer’s verdict, the policy simulator’s output, the cost. Tap approve. Biometric confirms. Done.

This is the workflow that makes “the operator is on a phone” actually scale. If approvals required a laptop, the whole topology would be bottlenecked on the operator being at their desk. With phone approvals, the bottleneck shifts back to the agents producing approvable work — which is exactly where you want the bottleneck to be.

A Day in the Life

What this looks like in practice:

   07:30  Subway        Open pane. Glance: 4 fleets running clean.
                        Dismiss one stale alert. 45 seconds total.

   09:15  Office        Laptop session: kick off a new fleet, set
                        a council against a hard refactor.

   11:00  Coffee meet   Phone buzz: council needs approval. Magic-link.
                        Read summary. Approve. Back to meeting.

   12:30  Lunch         Pane: cost trend ▲. Drill into chat: one agent
                        retrying tool call with token-heavy prompts.
                        Send "use small model for retry" to fleet room.

   14:00  Hallway       Push: fleet2 blocked. Open chat. Read last
                        50 messages. Issue is unclear. Send "pause."
                        Mental note: look at this on laptop after 3pm.

   15:30  Office        Laptop: pull up fleet2's blocked agent screen
                        via noVNC. Spot the issue (missing env var).
                        Fix. Resume fleet. Back to other work.

   22:00  Home          Pane: 3 fleets done, 1 still running through
                        the night. Cost on track. Phone to bed.

Three of those six interactions happened on the phone. Three on the laptop. Total operator time: maybe 25 minutes across the whole day. The fleet ran for 14 hours straight, produced N PRs, and the operator did not have to be at a desk to keep it going.

That is the point. Not that the phone replaces the laptop. That the operator’s presence at a particular machine stops being a bottleneck on the throughput of the swarm.

Where the Topology Earns Its Keep

The five-part topology — foundation → fleet → multi-fleet → council → phone-first — is what makes the difference between “AI coding agents are a productivity tool” and “AI coding agents are infrastructure.” Productivity tools live on your laptop and you use them when you sit down. Infrastructure runs on its own and you check on it from wherever you happen to be.

In 2026, the teams that get the most out of coding agents are the ones who built infrastructure, not the ones who use productivity tools. The investment is mostly architectural — the first time. The dividends are continuous after that.

Where to Go Next

calliope-agterm — the mobile-friendly web terminal. PWA-installable, touch-supported, multi-session.
calliope-agents — the agent images, with terminal-tier specifically tuned for thin-client (phone) operation.
calliope-scuttlebot — the scuttlebot web UI is mobile-responsive by design.
zentinelle — the approval workflow and push notification integration that powers phone-driven approvals.
docs.calliope.ai — PWA installation guides, mobile-mode patterns, recommended approval flows.

This concludes the agent-swarm series. Earlier parts: Foundation , Fleet Against One Problem , Many Problems, One Pane , and The Council . Build it once, and the rest is multipliers.