June 25, 20265 min readBrian McManus

Your AI gateway can't see the agents that matter

A gateway governs exactly one thing: traffic that agrees to route through it. The local Claude Code, the hand-added MCP server, the tool call that never leaves the laptop — governance has to live where the call originates.

beaconendpointgovernancemcp

There's a comfortable story going around: put a gateway in front of your AI usage, route every model call and every MCP request through it, and you have a control plane. It's comfortable because it looks like the API gateway pattern everyone already knows. And for the traffic that actually routes through it, it works.

That qualifier is the entire problem. A gateway can govern exactly one thing: traffic that agrees to go through it.

The well-behaved case is the easy case

Route a hosted agent's model calls through a gateway you configured, and you get useful things: rate limits, model allow-lists, spend tracking, a log. Keep all of that. Nothing in this post argues for ripping out a gateway.

But look at where AI agents actually run in a modern engineering org, and count what never touches that path:

The local agent. An engineer runs Claude Code in a terminal, configured against the vendor's endpoint or an API key from their own dotfiles. It never dials your gateway. It reads the same repo, the same .env, the same production credentials in ~/.aws.
The hand-added MCP server. A developer pastes five lines of JSON into an editor config to wire Cursor to a filesystem or database server. No ticket, no package manager, no deployment. It is infrastructure now, and no system of record knows it exists.
The call that never leaves the laptop. Most MCP servers run as local child processes over stdio. When an agent invokes a tool, that invocation is a pipe between two processes on the same machine. There is no network hop to intercept, at your gateway or anywhere else.

You cannot gateway your way to visibility of traffic that, by definition, does not pass through the gateway.

This is a governance gap, not a detection gap

The tempting move here is to dramatize: imagine the injected prompt, the exfiltrated secret, the incident report. We'd rather make the boring, structural point, because it's the one that holds up in a security review.

Before anything malicious happens at all, the un-gatewayed agent layer fails the three questions every other part of your stack can answer:

Inventory. Which agents, versions, and MCP servers are running across the fleet? For laptops, today, the honest answer at most companies is a shrug.
Policy. What is each agent allowed to invoke? A hand-added MCP server has whatever scope the developer gave it at 6pm on a Thursday. There is no place where policy could even attach.
Audit. What did agents actually do last week? For calls that never left the endpoint, there is no record. Not a poor record: none.

Your compliance program answers these questions for every laptop, every package, and every SaaS login. The agent layer is the exception, and it's the newest, fastest-growing layer you have.

Govern where the call originates

If the calls happen on the endpoint, the inventory and the policy have to live on the endpoint. That's the fix, and it's architectural rather than clever.

Beacon is a signed, user-space agent with no kernel driver and no traffic interception. It starts read-only. On day one it does exactly two things: inventories the AI layer of each endpoint, and observes MCP activity where it originates. A typical first-day finding looks like this:

finding   = mcp-server.unpinned
host      = dev-laptop-112
agent     = cursor 1.3.2
source    = ~/.cursor/mcp.json  (hand-edited; no managed config)
server    = "filesystem" — npx -y @modelcontextprotocol/server-filesystem ~/
version   = unpinned (resolves to latest on every launch)
flags     = [unsanctioned, unpinned, broad-scope]

No Hollywood detection, just a true fact your gateway is structurally unable to know: an unpinned MCP server with home-directory scope, added by hand, re-resolving to whatever the registry serves next launch.

From there, governance graduates at your pace. Policy is evaluated locally, at the point where the call originates, so it applies to local agents and stdio servers exactly as it applies to routed traffic. It starts in monitor mode: you see every decision policy would have made, over real usage, before anything is enforced. When the data says you're ready, you turn enforcement on per policy and per cohort, and every decision lands in a tamper-evident audit chain your auditors can verify independently.

None of this makes a gateway wrong. It makes a gateway partial. The same is true of the rest of your stack: your EDR sees the process but not the tool call, your vulnerability scanner sees the package but not the MCP config, your IdP sees the token but not what the agent invoked with it. Each does its job, and each stops one layer short of where agents actually operate. The endpoint is the one place every agent converges, whatever vendor it came from, whether or not it ever touches a network path you control.

The practical starting point is not a migration or an architecture debate. Deploy Beacon read-only on a pilot cohort, and thirty minutes later you have an Agent Exposure Report: every agent, every version, every MCP connection, every hygiene flag, in one artifact your platform team can act on and your CISO can take upstairs. If the report shows your gateway already sees everything, you've spent thirty minutes confirming a control works.

We have yet to see that report come back empty. Get yours.

Your AI gateway can't see the agents that matter

The well-behaved case is the easy case

This is a governance gap, not a detection gap

Govern where the call originates

Keep the gateway. Close its blind side.

A control plane you don’t have to trust.