Chapter 16: Remote Control and Cloud Execution
The Agent Reaches Beyond Localhost
Every chapter so far has assumed that Claude Code runs on the same machine where the code lives. The terminal is local. The filesystem is local. The model responses stream back to a process that owns both the keyboard and the working directory.
That assumption breaks the moment you want to control Claude Code from a browser, run it inside a cloud container, or expose it as a service on your LAN. The agent needs a way to receive instructions from a web browser, a mobile app, or an automated pipeline — forward permission prompts to someone who is not sitting at the terminal, and tunnel its API traffic through infrastructure that might inject credentials or terminate TLS on the agent’s behalf.
Claude Code solves this with four systems, each addressing a different topology:
These systems share a common design philosophy: reads and writes are asymmetric, reconnection is automatic, and failures degrade gracefully.
Bridge v1: Poll, Dispatch, Spawn
The v1 bridge is the environment-based remote control system. When a developer runs claude remote-control, the CLI registers with the Environments API, polls for work, and spawns a child process per session.
Before registration, a gauntlet of pre-flight checks runs: runtime feature gate, OAuth token validation, organization policy check, dead token detection (a cross-process backoff after three consecutive failures with the same expired token), and proactive token refresh that eliminates roughly 9% of registrations that would otherwise fail on the first attempt.
Once registered, the bridge enters a long-poll loop. Work items arrive as sessions (with a secret field containing session tokens, API base URL, MCP configs, and environment variables) or healthchecks. The bridge throttles “no work” log messages to every 100 empty polls.
Each session spawns a child Claude Code process communicating via NDJSON on stdin/stdout. Permission requests flow through the bridge transport to the web interface where the user approves or denies. The round-trip must complete within roughly 10-14 seconds.
Bridge v2: Direct Sessions and SSE
The v2 bridge eliminates the entire Environments API layer — no registration, no polling, no acknowledgment, no heartbeat, no deregistration. The motivation: v1 required the server to know the machine’s capabilities before dispatching work. V2 collapses the lifecycle to three steps:
- Create session:
POST /v1/code/sessionswith OAuth credentials. - Connect bridge:
POST /v1/code/sessions/{id}/bridge. Returns aworker_jwt,api_base_url, andworker_epoch. Each/bridgecall bumps the epoch — it IS the registration. - Open transport: SSE for reads,
CCRClientfor writes.
The transport abstraction (ReplBridgeTransport) unifies v1 and v2 behind a common interface, so message handling does not need to know which generation it is talking to.
When the SSE connection drops due to a 401, the transport rebuilds with fresh credentials from a new /bridge call while preserving the sequence number cursor — no messages are lost. The write path uses per-instance getAuthToken closures instead of process-wide environment variables, preventing JWT leakage across concurrent sessions.
The FlushGate
A subtle ordering problem: the bridge needs to send conversation history while accepting live writes from the web interface. If a live write arrives during the history flush, messages could be delivered out of order. The FlushGate queues live writes during the flush POST and drains them in order when it completes.
Token Refresh and Epoch Management
The v2 bridge proactively refreshes worker JWTs before expiry. A new epoch tells the server this is the same worker with fresh credentials. Epoch mismatches (409 responses) are handled aggressively: both connections close and an exception unwinds the caller, preventing split-brain scenarios.
Message Routing and Echo Deduplication
Both bridge generations share handleIngressMessage() as the central router:
- Parse JSON, normalize control message keys.
- Route
control_responseto permission handler,control_requestto request handler. - Check UUID against
recentPostedUUIDs(echo dedup) andrecentInboundUUIDs(re-delivery dedup). - Forward validated user messages.
BoundedUUIDSet: O(1) Lookup, O(capacity) Memory
The bridge has an echo problem — messages may echo back on the read stream or be delivered twice during transport switches. BoundedUUIDSet is a FIFO-bounded set backed by a circular buffer:
class BoundedUUIDSet {
private buffer: string[]
private set: Set<string>
private head = 0
add(uuid: string): void {
if (this.set.size >= this.capacity) {
this.set.delete(this.buffer[this.head])
}
this.buffer[this.head] = uuid
this.set.add(uuid)
this.head = (this.head + 1) % this.capacity
}
has(uuid: string): boolean { return this.set.has(uuid) }
}
Two instances run in parallel, each with capacity 2000. O(1) lookup via the Set, O(capacity) memory via circular buffer eviction, no timers or TTLs. Unknown control request subtypes get an error response, not silence — preventing the server from waiting for a response that never comes.
The Asymmetric Design: Persistent Reads, HTTP POST Writes
The CCR protocol uses asymmetric transport: reads flow through a persistent connection (WebSocket or SSE), writes go through HTTP POST. This reflects a fundamental asymmetry in the communication pattern.
Reads are high-frequency, low-latency, server-initiated — hundreds of small messages per second during token streaming. A persistent connection is the only sensible choice. Writes are low-frequency, client-initiated, and require acknowledgment — messages per minute, not per second. HTTP POST provides reliable delivery, idempotency via UUIDs, and natural integration with load balancers.
Trying to unify them on a single WebSocket creates coupling: if the WebSocket drops during a write, you need retry logic and must distinguish “not sent” from “sent but acknowledgment lost.” Separate channels let each be optimized independently.
Remote Session Management
The SessionsWebSocket manages the client side of a CCR WebSocket connection. Its reconnection strategy discriminates between failure types:
| Failure | Strategy |
|---|---|
| 4003 (unauthorized) | Stop immediately, no retries |
| 4001 (session not found) | Max 3 retries, linear backoff (transient during compaction) |
| Other transient | Exponential backoff, max 5 attempts |
The isSessionsMessage() type guard accepts any object with a string type field — deliberately permissive. A hardcoded allowlist would silently drop new message types before the client is updated.
Direct Connect: The Local Server
Direct Connect is the simplest topology: Claude Code runs as a server and clients connect via WebSocket. No cloud intermediary, no OAuth tokens.
Sessions have five states: starting, running, detached, stopping, stopped. Metadata persists to ~/.claude/server-sessions.json for resume across server restarts. The cc:// URL scheme provides clean addressing for local connections.
Upstream Proxy: Credential Injection in Containers
The upstream proxy runs inside CCR containers and solves a specific problem: injecting organization credentials into outbound HTTPS traffic from a container where the agent might execute untrusted commands.
The setup sequence is carefully ordered:
- Read the session token from
/run/ccr/session_token. - Set
prctl(PR_SET_DUMPABLE, 0)via Bun FFI — blocking same-UID ptrace of the process heap. Without this, a prompt-injectedgdb -p $PPIDcould scrape the token from memory. - Download the upstream proxy CA certificate and concatenate with system CA bundle.
- Start a local CONNECT-to-WebSocket relay on an ephemeral port.
- Unlink the token file — the token now exists only on the heap.
- Export environment variables for all subprocesses.
Every step fails open: errors disable the proxy rather than killing the session. The correct tradeoff — a failed proxy means some integrations will not work, but core functionality remains available.
Protobuf Hand-Encoding
Bytes through the tunnel are wrapped in UpstreamProxyChunk protobuf messages. The schema is trivial — message UpstreamProxyChunk { bytes data = 1; } — and Claude Code encodes it by hand in ten lines rather than pulling in a protobuf runtime:
export function encodeChunk(data: Uint8Array): Uint8Array {
const varint: number[] = []
let n = data.length
while (n > 0x7f) { varint.push((n & 0x7f) | 0x80); n >>>= 7 }
varint.push(n)
const out = new Uint8Array(1 + varint.length + data.length)
out[0] = 0x0a // field 1, wire type 2
out.set(varint, 1)
out.set(data, 1 + varint.length)
return out
}
Ten lines replace a full protobuf runtime. A single-field message does not justify a dependency — the maintenance burden of the bit manipulation is far lower than the supply chain risk.
Apply This: Designing Remote Agent Execution
Separate read and write channels. When reads are high-frequency streams and writes are low-frequency RPCs, unifying them creates unnecessary coupling. Let each channel fail and recover independently.
Bound your deduplication memory. The BoundedUUIDSet pattern provides fixed-memory deduplication. Any at-least-once delivery system needs a bounded dedup buffer, not an unbounded Set.
Make reconnection strategy proportional to the failure signal. Permanent failures should not retry. Transient failures should retry with backoff. Ambiguous failures should retry with a low cap.
Keep secrets heap-only in adversarial environments. Reading the token from a file, disabling ptrace, and unlinking the file eliminates both filesystem and memory-inspection attack vectors.
Fail open for auxiliary systems. The upstream proxy fails open because it provides enhanced functionality (credential injection), not core functionality (model inference).
The remote execution systems encode a deeper principle: the agent’s core loop (Chapter 5) should be agnostic about where instructions come from and where results go. The bridge, Direct Connect, and upstream proxy are transport layers. The message handling, tool execution, and permission flows above them are identical regardless of whether the user is sitting at the terminal or on the other side of a WebSocket.
The next chapter examines the other operational concern: performance — how Claude Code makes every millisecond and token count across startup, rendering, search, and API costs.