
Tool Sprawl: 23 AI Tools, Zero Governance
The Audit That Made It Real A mid-market financial services firm we worked with last quarter ran an internal survey: how …

Part 2 ended with a working fleet — N agents pointed at one problem, coordinating through one shared channel, producing one merged result. That is a useful unit. It is not yet the unit most teams actually need.
The real shape of engineering work is many problems, in flight at the same time, on different timelines. A migration in one repo. A bug-bash sprint in another. A documentation cleanup in a third. A security upgrade in a fourth. Each of these is a fleet. Each fleet has its own goal, its own duration, its own agents, its own coordination room. And the human watching them — you — does not want to context-switch between four browser tabs.
What you want is one pane of glass. One surface where every fleet is visible, every status is current, every escalation surfaces. You drill in when something interesting happens. You ignore the rest. The fleets themselves do not need to know about each other; only you do.
This part is about that pane.
┌────────────────────────────────────────────────┐
│ YOUR CLOUD │
│ │
│ ┌─────────┐ │
│ │ Hub │ │
│ └────┬────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐│
│ │ Fleet 1 │ │ Fleet 2 │ │ Fleet 3 ││
│ │ migration │ │ bug-bash │ │ docs ││
│ │ │ │ │ │ ││
│ │ A₁ A₂ A₃ │ │ A₄ A₅ │ │ A₆ A₇ A₈ ││
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘│
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Scuttlebot │ │
│ │ │ │
│ │ #fleet1-room │ │
│ │ #fleet1-agent1 │ │
│ │ #fleet1-agent2 │ │
│ │ #fleet1-agent3 │ │
│ │ #fleet2-room │ │
│ │ #fleet2-agent4 │ │
│ │ #fleet2-agent5 │ │
│ │ #fleet3-room │ │
│ │ #fleet3-agent6 │ │
│ │ #fleet3-agent7 │ │
│ │ #fleet3-agent8 │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────┘
Three fleets, one scuttlebot. The naming convention is the entire trick: every channel is prefixed by its fleet name. The scuttlebot UI groups by prefix. You see fleet 1’s room, fleet 2’s room, fleet 3’s room — all in a left-side channel list — and the messages stream into a single timeline if you want to watch everything at once, or into a single channel if you want to focus.
The fleets do not coordinate with each other. They share the same nervous system (scuttlebot), the same identity model (the hub), the same observability layer (the next paragraph) — but the work each fleet does is independent.
A working “single pane of glass” view is, in practice, three zones stacked into one operator surface. Whether you implement that as one custom UI or as three separate browser surfaces side by side does not matter — what matters is that all three are visible.
┌──────────────────────────────────────────────────────────────────┐
│ │
│ ┌────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Zone 1 — Fleet │ │ Zone 2 — Live Stream │ │
│ │ Status │ │ │ │
│ │ │ │ fleet3-agent6: claimed task 12 │ │
│ │ Fleet 1: ▶ 60% │ │ fleet1-agent2: tests passing │ │
│ │ Fleet 2: ◼ blocked│ │ fleet2-agent4: BLOCKED ▲ │ │
│ │ Fleet 3: ▶ 35% │ │ fleet1-agent3: merged PR #42 │ │
│ │ Fleet 4: ✓ done │ │ fleet3-agent7: claimed task 13 │ │
│ │ │ │ fleet1-agent1: tests passing │ │
│ └────────────────────┘ └─────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Zone 3 — Cost & Anomaly Watch │ │
│ │ │ │
│ │ Total spend (24h): $87.30 ▲ 4% vs 7-day avg │ │
│ │ Per-fleet: F1 $42 F2 $18 F3 $27 │ │
│ │ Anomalies: fleet2-agent4 latency p99 +180% ◀ drill in │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Zone 1 — Fleet Status. A summary line per fleet. Progress percentage, state (running, blocked, done, paused), agent count. Computed from the structured task-claimed / task-done / task-blocked messages the agents post into their fleet rooms. This is the ambient view — at-a-glance, you know whether anything needs your attention.
Zone 2 — Live Stream. A flat, time-ordered view of every meaningful event across every fleet. Not every model call (too noisy) — just the events you would care about: claims, completions, blocks, merges. Color-coded by fleet. Clickable into the specific channel for context.
Zone 3 — Cost & Anomaly Watch. What your fleets are costing you, in motion, and what is statistically weird. Cost per fleet, rolling baseline comparisons, anomaly detection on latency, error rate, and content scanner hits. This zone is where the observability layer (Zentinelle in our stack) earns its keep — it is what turns the scuttlebot chat stream into operational signal.
Most of the time, the pane is ambient and you ignore it. Then something blinks:
┌──────────────────┐
│ Zone 2 stream │
│ │
│ ▲ fleet2-agent4: │
│ BLOCKED │
└────────┬─────────┘
│
▼ click
┌──────────────────┐
│ Channel View │
│ #fleet2-agent4 │
│ │
│ Last 50 messages │
│ Recent errors │
│ Last tool call │
└────────┬─────────┘
│
if you need to see the agent screen:
│
▼ click
┌──────────────────┐
│ noVNC: agent #4 │
│ Live desktop │
│ Keyboard/mouse │
└──────────────────┘
Three clicks deep. Each click is a zoom: ambient → channel → screen. Most blockers resolve at the channel level — you see the error, you tell the agent what to do, you go back to the pane. Some need the screen — you actually intervene, fix something, restart the agent, let it go.
The trick is that you almost never sit at the screen level. You sit at the pane level. The screen is for exceptions.
A pane covering 4 fleets and 30 agents is operationally cheaper than four browser tabs covering one fleet each — but only if the pane is built right. The failure mode is when “single pane of glass” becomes “single pane of noise.”
Three discipline rules that keep the pane useful:
Structured messages. Every status-relevant message from an agent follows a known schema: claim, progress, block, done. Free-form chatter goes into the per-agent channel, not the room. This makes Zone 2 filterable.
Quiet by default. The pane defaults to blocked and anomaly events only. Normal progress is summarized in Zone 1; you only see individual messages in Zone 2 if they require attention or you have opened a specific fleet.
Per-fleet ownership. Even though one operator watches the pane, each fleet has a nominal owner. When a blocker fires, the owner gets pinged. The watcher’s job is to triage and route, not to fix everything personally.
These three rules are what turn the pane from “another inbox” into “a control surface.”
Practical decision rule for whether two pieces of work belong in one fleet or two:
shared codebase?
├── yes
│ ├── shared context window?
│ │ ├── yes → one fleet, leader-follower
│ │ └── no → one fleet, work-stealing
│ └── (continue)
└── no → two fleets, each in its own room
If two pieces of work touch the same codebase and would benefit from sharing context — a refactor and the tests for that refactor — they should be one fleet with a leader agent (part 2’s Pattern C).
If they touch the same codebase but are independent — fix 30 lint warnings in repo A, refactor a different module in repo A — they are one fleet with a work-stealing queue (part 2’s Pattern B).
If they touch different codebases entirely — fleet 1 in repo A, fleet 2 in repo B — they are two fleets, in two rooms, on the same pane.
The pane absorbs all three cases without changing shape.
Three operational properties only become possible at this layer:
Always-on capacity. A fleet is running, on something useful, almost all the time. Idle agents — and idle developers waiting on agents — go away when you have multiple backlogs queued.
Triage as a workflow. New work arrives → assigned to a fleet → fleet picks it up → status reports on the pane → done. The path from “we should do this” to “this is being done” is short and visible.
Cost as a first-class metric. Per-fleet cost is comparable. You learn which fleets cost what; you learn which kinds of work are expensive; you learn which models give you the best per-dollar throughput on which tasks. None of this is visible when you run one agent at a time.
There are three reasonable ways to build the single pane of glass:
The scuttlebot web UI itself. With good channel naming and a custom overview page, scuttlebot can be the pane. Cheapest. Works well up to ~5 fleets.
A small dashboard app pulling from scuttlebot + the observability layer. Custom panel reading the IRC history (zone 2), the structured task-* events (zone 1), and the metrics (zone 3). Best for teams running >5 fleets routinely.
The governance/observability portal itself. Zentinelle’s real-time event stream plus dashboards already does most of this — it sees every model call, every policy decision, every cost. Wire scuttlebot events into the same stream and you get a unified pane for free.
In our reference setup, option 3 is the default — the pane lives in the same surface your CISO uses for compliance, with a different filter applied. One surface, two audiences.
We are still running homogeneous fleets — every agent runs the same model, the same way, on the same image. Part 4 adds heterogeneity: a fleet where Claude handles planning, Codex handles implementation, and Gemini handles review, all in the same room, coordinated through scuttlebot. The “council” pattern.
That is where the swarm stops being “many of the same thing” and becomes a team.
calliope-agents
— multi-fleet spawn templates and the relay configuration that tags every event with fleet:<name> so the pane can filter cleanly.
calliope-scuttlebot — channel naming conventions, history persistence, and the web UI’s grouping behavior.
zentinelle — the observability layer that handles Zone 3 (cost, anomaly, alerts) and integrates with scuttlebot for unified event streams.
docs.calliope.ai — pane patterns, multi-fleet management guides, and the recommended dashboard setup.
Next in this series — Part 4: The Council. Mixed-runtime swarms where different agents handle different phases of the same task, and the coordination patterns that keep them from arguing.

The Audit That Made It Real A mid-market financial services firm we worked with last quarter ran an internal survey: how …

The Two Loud Ends Look at any conference panel about enterprise AI in 2026 and you will see two organizations on stage. …