The Agent Is the Attack Surface: What We Found Hardening Our Own CLI

The Agent Is the Attack Surface: What We Found Hardening Our Own CLI

Jun 15, 2026 - 6 Min read

Text generators have no attack surface. Agents do.

A chatbot returns a string. The worst it can do is be wrong. An agent is a different animal. It runs shell commands, reads and writes files, calls tools, executes code, and reaches out to servers it was told about at runtime. The moment you give a language model that reach, every one of those capabilities becomes a surface that something can push on: an attacker, a prompt injection buried in a fetched page, or the model’s own bad day.

Most teams adopt agents without auditing that surface. The demo works, the agent ships a useful thing, and nobody asks what happens when the model is convinced to run the wrong command. We build agent tooling for a living and sell the discipline of running AI safely inside a perimeter, so we held ourselves to the standard we ask customers to meet. We put our own agent CLI through a full security review, about forty-six thousand lines of TypeScript, and shipped the fixes as a release.

It surfaced more than thirty findings. A few are the kind every team running agents will eventually meet, no matter whose tooling they use. Here are the ones worth generalizing.

1. Model output reaching a shell is an injection surface

The most serious finding was a path where a tool’s arguments, values the model controls, were interpolated directly into a shell string and executed. The intended command was harmless. A crafted argument turned it into arbitrary command execution. No exotic technique required, just a pipe and a second command where the code assumed there would only be a filename.

This is the defining vulnerability class of agentic systems, and it is worth stating plainly: anywhere model output flows into a shell, you have a command-injection surface. The fix is the same one that has held in web security for twenty years. Never build a command by string concatenation; pass arguments as a quoted argument vector so the shell can never reinterpret them as syntax. The interesting part is that the unsafe path sat right next to a safe one in the same file. One code path escaped its inputs and the one beside it didn’t. Audits exist to find exactly that asymmetry.

2. A sandbox that fails open is not a sandbox

We found a “sandbox” for shell execution that ran with the network enabled and read access to the entire home directory, including ~/.ssh, ~/.aws, and .env files. On platforms without a native sandbox backend, it silently fell through to running commands with no isolation at all.

A mitigation that leaks the things it is supposed to protect is worse than no mitigation, because it tells you you’re covered. We tightened the real boundary. Network off by default, secret directories denied, and an explicit mode for callers who require hard enforcement. We also drew a clear line between best-effort and guaranteed. An agent runtime has to be honest about which one it is offering, because the security team’s threat model depends on the answer.

3. Denylists are advisory; the boundary is the control

The CLI shipped a blocklist of dangerous shell commands. It was trivially bypassable with a different separator, a leading variable assignment, or a quote in the middle of sudo. This is not a bug you fix by adding more patterns. A denylist over a shell is unwinnable, because the shell has more ways to express a command than any list can enumerate. The lesson is not “write a better blocklist.” It is that the blocklist is a usability hint, and the actual control has to live at the sandbox boundary, where the question is what a process can touch rather than whether a string looks dangerous.

4. Agents that register their own connections can be steered into your network

Our CLI speaks the Model Context Protocol, which lets an agent connect to external tool servers by URL. The fetch path had no egress controls. An agent that could be convinced to register a server pointing at 169.254.169.254, the cloud metadata endpoint, or at an internal service on a private range could be turned into a server-side request forgery primitive. This is the agent-era version of a classic SSRF, and it generalizes. Any capability that lets the model choose a URL to call is an egress decision that belongs to your network policy, not to the model. We added a guard that blocks link-local and private ranges by default while still allowing the local servers people legitimately run.

5. Trust that grants itself is not trust

There was a trust system designed to stop a malicious project from injecting instructions into the agent, and it auto-trusted every new directory the first time it was opened. The protection existed and defeated itself in the same breath. Clone a hostile repository, open it, and its instructions loaded as if you had vouched for them. A trust boundary that grants itself is decoration. The fix was to make the human do the granting.

6. The supply chain is part of the agent

Two findings had nothing to do with the agent’s runtime and everything to do with whether it could be trusted to keep working. The model identifiers were hardcoded and had quietly gone stale. One had already been retired by the provider and was returning errors, and others were days from the same fate. We replaced the hardcoded list with live discovery from each provider, so the agent learns what models exist instead of asserting it. A routine dependency audit also turned up eleven known vulnerabilities, two of them critical, which we cleared as part of the release.

Agents are long-lived processes wired to a half-dozen external services. The supply chain, meaning model endpoints, package dependencies, and the provenance of what you install, is not adjacent to the agent’s security. It is the agent’s security.

7. A fix that never ships is not a fix

The finding with the sharpest governance lesson was not in the code at all. It was in the release pipeline. Three consecutive releases had failed to publish to the package registry, silently, because the failure was buried in a build step nobody was watching. The practical effect: every user was running a version several releases behind, unable to receive any of the fixes above even after we wrote them.

This is the gap that governance exists to close. “Fixed in the main branch” and “shipped to the people running it” are different states, and the distance between them is where security debt accumulates. An audit that produces patches nobody can install has produced a report, not a remediation. We fixed the pipeline first, and then everything else could actually reach a user.

The discipline is the product

None of these is exotic, and that is the point. Command injection, fail-open isolation, unbounded egress, self-granting trust, a stale supply chain, and a broken release path are the standard surface of any system that lets a model act in the world. If you run agents in the enterprise, this is the class of risk you inherit, whether or not anyone has written it down.

We found ours by looking, on purpose, against a checklist, before the issues had a chance to matter. Then we shipped the fixes, and the pipeline that lets fixes ship. The work was tedious and unglamorous, which is exactly what security work is. The agents and the tooling are the visible product. The discipline of auditing them, hardening them, and getting the fixes into users’ hands is the real one.