preloader
blog post

Coding Agent Swarms, Part 4: The Council — Mixed-Runtime Swarms

author image

When One Model Isn’t the Right Model

By now in this series we have built a foundation (part 1), scaled it to a single-problem fleet (part 2), and run multiple fleets through one pane of glass (part 3). Every agent in every fleet so far has been the same: same runtime, same model family, same image. That assumption is the last thing left to break.

In 2026 there is no single best model for every step of a coding task. Claude consistently wins at long-context reasoning and code edits across many files. GPT-style runtimes are excellent at structured planning and decomposition. Gemini handles certain numerical and multimodal tasks differently. Open-source models — Llama, Qwen, DeepSeek — are good enough for a lot of mechanical work at a fraction of the cost. The right answer for a hard coding task is increasingly: use different models for different phases.

A swarm where different agents run different models is a council. This part is about what that topology looks like, when to reach for it, and how to keep three different models from arguing in circles.

The Topology, Heterogeneous

A council swarm replaces “N identical agents” with “M agents with M roles, each running the model best suited to its role.”

                  ┌──────────────────────────────────────┐
                  │             #council-room            │
                  └─────────────┬────────────────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              ▼                 ▼                 ▼
        ┌──────────┐      ┌──────────┐      ┌──────────┐
        │ Planner  │      │ Builder  │      │ Reviewer │
        │          │      │          │      │          │
        │ Claude   │      │ Codex    │      │ Gemini   │
        │ (long    │      │ (fast    │      │ (fresh   │
        │  context)│      │  edits)  │      │  eyes)   │
        └────┬─────┘      └────┬─────┘      └────┬─────┘
             │                 │                 │
             └─────────────────┴─────────────────┘
                             ▼
                       ┌───────────┐
                       │  result   │
                       │  artifact │
                       └───────────┘

Three agents, three roles, three runtimes. The room — #council-room in scuttlebot terms — is where the council coordinates. Each agent posts its outputs there: the planner posts the plan, the builder posts the diffs, the reviewer posts the verdict. Each agent reads the others’ outputs as input for its next step.

This is a workflow, not a free-for-all. The order matters: planner first, builder second, reviewer last. The room enforces order by convention, not by code — the builder waits for the planner’s “plan ready” message; the reviewer waits for the builder’s “diff ready” message; the cycle repeats until the reviewer posts “approved” or the planner posts “abandon.”

The Three Canonical Roles

The roles that consistently earn their keep in a coding council, and the runtime characteristics that suit each:

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Planner                                                     │
│  ───────                                                     │
│  • Reads the goal, reads the codebase                        │
│  • Produces a decomposition: ordered list of changes         │
│  • Re-plans when blockers surface                            │
│  • Needs: long context window, strong reasoning              │
│                                                              │
│  Builder                                                     │
│  ───────                                                     │
│  • Takes one step from the plan, executes it                 │
│  • Writes code, runs tests, iterates locally                 │
│  • Reports status back to the room                           │
│  • Needs: fast iteration, good code-edit quality             │
│                                                              │
│  Reviewer                                                    │
│  ────────                                                    │
│  • Reads the builder's diff with fresh context               │
│  • Looks for bugs, missing tests, regressions, style         │
│  • Approves, requests changes, or escalates to planner       │
│  • Needs: different model family from builder (orthogonality)│
│                                                              │
└──────────────────────────────────────────────────────────────┘

The runtime assignment is a matter of empirical fit and team taste. The constraint that holds across teams: the reviewer should run a different model family from the builder, because two agents running the same model will share blind spots. Cross-family review catches things same-family review will not.

The Coordination Cycle

A council coding task moves through a known cycle. The cycle is what keeps three models from talking past each other.

   ┌─────────┐    plan     ┌─────────┐   diff     ┌──────────┐
   │ Planner │ ──ready──▶  │ Builder │ ──ready──▶ │ Reviewer │
   └─────────┘             └─────────┘            └────┬─────┘
        ▲                       ▲                      │
        │                       │                      │
        │ abandon /             │ rework /             │ approve /
        │ replan ◀──────────────┴──────────────────────┘ request
        │                                                changes
        │
   ┌────┴────────────────┐
   │  task complete or   │
   │  escalated to human │
   └─────────────────────┘

The cycle has exactly four transition messages, all posted to the room as structured events:

  • plan ready — planner has produced a decomposition. Builder may begin.
  • diff ready — builder has implemented one step and is requesting review.
  • approved — reviewer accepts. Move to the next step in the plan.
  • request changes — reviewer rejects. Builder revises, posts a new diff ready. After N rework cycles on the same step (the council’s tunable patience), the builder escalates back to the planner with rework limit reached, and the planner re-plans.

The structured events are what let the pane (part 3) show council status without the operator reading the chat in real time. A glance at the room’s last event tells you whose turn it is.

When the Council Beats a Solo Agent

A council is more expensive than a single agent — three runtimes, three context windows, more round-trips. The trade is worth it when:

  1. The task spans many files and needs a plan first. A solo agent can plan and build, but it tends to entangle the two; a council forces the plan to exist independently of the implementation, which catches whole classes of mistake.

  2. The codebase is unfamiliar to the model. Two model families reading the same code produce different mental models; the reviewer catches the builder’s confusions about what existing code does.

  3. The cost of being wrong is high. Security-relevant changes, schema migrations, anything touching authentication or billing. Cross-family review is a cheap insurance policy compared to the cost of merging a bad change.

When not to reach for a council:

  • The task is mechanical and obvious. Renaming a variable, updating an import, fixing a lint warning. Council overhead is wasted; a single agent (or even a deterministic script) is fine.

  • The task is small but exploratory. A research spike, a prototype, a “see if this is possible.” The planner adds overhead the spike does not benefit from.

  • You do not have governance in place to track three models. Mixed-runtime swarms hit three different providers with three different cost profiles; without observability (Zentinelle or your equivalent), the bill is a surprise.

The Failure Modes — and the Fixes

Three failure modes recur in council swarms. Naming them up front is most of the cure.

   Failure                          Why                  Fix
   ────────────────────────────────────────────────────────────────
   1. Ping-pong rework              Reviewer is too      Cap rework
      Builder → Reviewer →          strict, builder      cycles per
      Builder → Reviewer ...        is too literal       step (N=3),
                                                         then escalate

   2. Planner over-decomposes       Plan has 80 steps    Planner cap:
      Plan never converges          for a small task     max steps
                                                         proportional
                                                         to task size

   3. Reviewer rubber-stamps        Cross-family         Inject occasional
      Approves everything           review is failing    "negative tests" —
      including bugs                                     deliberately bad
                                                         diffs to verify
                                                         reviewer catches them

The fixes are all configurable patience and validation, not architectural changes. The council topology stays the same. You tune the council’s behavior by tightening the contract on each role.

How the Pane Shows a Council

In the multi-fleet pane from part 3, a council fleet looks slightly different from a homogeneous one — but only in the channel structure. The room (#council-room) shows the cycle messages; the per-agent channels show each agent’s individual reasoning and tool calls.

   ┌─────────────────────────────────────────────────────────┐
   │  Pane View — Council Fleet                              │
   │                                                         │
   │  Fleet: refactor-auth-layer    Status: ▶ step 3 of 5    │
   │                                                         │
   │  Planner (Claude)   ✓ plan ready    cost $0.42          │
   │  Builder (Codex)    ▶ diff ready    cost $0.18          │
   │  Reviewer (Gemini)  ⏳ pending      cost $0.00          │
   │                                                         │
   │  Last cycle: 12m ago — diff approved                    │
   │  Current step: "rotate session secrets"                 │
   │                                                         │
   └─────────────────────────────────────────────────────────┘

The operator reads this card and knows: the council is mid-task, on step 3 of 5, waiting on a review, the last cycle completed cleanly 12 minutes ago. Drilling into the room shows the actual conversation. Drilling further into a specific agent shows that agent’s noVNC desktop. Three clicks deep, same as part 3.

Hierarchies of Councils

The natural next step beyond a three-agent council is a hierarchy: one council for planning, multiple councils for implementation, a final council for integration.

                ┌─────────────────────┐
                │  Architecture       │
                │  Council            │
                │  (high-level plan)  │
                └──────────┬──────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ┌─────────┐  ┌─────────┐  ┌─────────┐
         │ Sub-    │  │ Sub-    │  │ Sub-    │
         │ Council │  │ Council │  │ Council │
         │ A       │  │ B       │  │ C       │
         └────┬────┘  └────┬────┘  └────┬────┘
              │            │            │
              └────────────┴────────────┘
                           ▼
                ┌─────────────────────┐
                │  Integration        │
                │  Council            │
                └─────────────────────┘

Three levels of councils against a hard problem is a fleet of nine to twelve agents, but the operator surface stays the same: the pane shows the top-level council’s status, with drill-down into sub-councils and individual agents. You read the top of the tree; you only descend when something goes wrong.

This is where coding swarms start to look like a software engineering organization: planning, implementation, review, integration — except the team members are agents, the meetings are scuttlebot rooms, and the artifacts are merge-ready diffs.

What Makes the Council Worth It

The single property that makes a council valuable, when it is valuable, is orthogonality. Three models, with three different training distributions, three different failure modes, three different blind spots. The probability that all three miss the same bug is much lower than the probability that one misses it.

That property does not appear in a homogeneous swarm, no matter how many agents you add. Ten copies of the same model are still one set of blind spots, applied ten times.

The council is the smallest swarm topology where heterogeneity matters more than parallelism.

What Part 5 Adds

We have built the full topology — foundation, fleet, multi-fleet, council. Part 5 is about the operator. Specifically: running all of this from your phone, away from your laptop, on a bad coffee-shop Wi-Fi, while the swarm grinds through your backlog. The mobile-first operator workflow.

Where to Go Next

  • calliope-agents — agent images for Claude, Codex, and Gemini runtimes. The relay binaries for each are what let a single scuttlebot host the council.

  • calliope-scuttlebot — room model, structured event conventions, message routing.

  • zentinelle — cost tracking and policy enforcement across mixed-runtime fleets. Council swarms make cost-per-fleet metrics essential, not optional.

  • docs.calliope.ai — council patterns, role templates, hierarchy guides.


Next in this series — Part 5: Phone-First. Running the whole stack from your pocket, the PWA workflow, AGTerm vs noVNC for thin clients, and the patterns that survive a bad network.

Related Articles