Technical deep dive

How ExoProtocol works

The design decisions behind a governance kernel for AI agent sessions. Why OS primitives, why a frozen kernel, and what actually happens when an agent session starts and finishes.

Origin

Treating agent development as a programming language

The question that started ExoProtocol wasn't "how do we add guardrails to AI agents?" It was: what would it look like if multi-agent development were a programming language?

A programming language needs a runtime. A runtime needs process isolation, memory management, scheduling, and crash recovery. We weren't trying to build an OS — we were solving the same class of problems that OS designers solved in the 1970s, and we converged on the same abstractions.

The kernel has 10 functions. It's been frozen since day one. Every feature since — drift detection, feature tracing, session-start intelligence, garbage collection — lives in the stdlib, not the kernel. This isn't an accident. It's the same design principle that makes Unix kernels stable: keep the enforcement surface minimal and build everything else in userspace.

Layer 0

The frozen kernel

The kernel does exactly four things: governance compilation, ticket management, policy enforcement, and audit logging. Ten functions, no expansion without RFC.

# The entire public kernel API — 10 functions

load_governance(root)           # Load constitution + lock
verify_governance(gov)          # Hash check: source vs lock

open_session(root, actor)       # Create session context
mint_ticket(session, ...)       # Issue scoped work ticket

validate_ticket(gov, ticket)    # Check ticket against rules
check_action(gov, session, ...) # Policy decision on action

resolve_requirements(...)       # Match evidence to requirements
commit_plan(session, ...)       # Commit execution plan

append_audit(root, event)       # Append to audit log
seal_result(session, ...)       # Finalize with receipt hash

Why frozen? Because the kernel is the trust boundary. If agents can't trust that governance rules are consistently enforced, the whole system is theater. A frozen kernel means the enforcement contract is auditable, testable, and stable. When you verify governance, you're comparing the constitution's hash against the compiled lock file's hash — if they don't match, something changed outside the approved process.

Everything else — session lifecycle, drift detection, feature tracing, dispatch scheduling, adapter generation, garbage collection — lives in exo/stdlib/. This layer can evolve freely without touching the trust boundary.

Brownfield on-ramp

Smart init: governance from existing repos

Most governance tools assume a greenfield project. ExoProtocol assumes the opposite: you have a messy repo with existing code, and you want governance to add value immediately without a week of configuration.

exo init scans the repo and detects language, source directories, build artifacts, sensitive files, CI systems, and test frameworks. It generates a project-aware constitution with rules that actually match your codebase.

$ exo scan

Language:    python
Source dirs: src/, exo/
Build dirs:  __pycache__/, *.egg-info/
Sensitive:   .env, credentials.yaml
CI:          .github/workflows/ (GitHub Actions)
Tests:       pytest (tests/)
Ignore:      node_modules/, .venv/, dist/

Rules to generate:
  RULE-001    No direct main commits
  RULE-002    Tests before review
  RULE-SEC-001 Deny .env, *.pem, credentials.*
  RULE-SEC-002 Deny credentials.yaml

Adapters:
  CLAUDE.md, .cursorrules, AGENTS.md
  .github/workflows/exo-governance.yml

The scan also generates adapter files — CLAUDE.md, .cursorrules, AGENTS.md — that inject governance rules into the agent's native context format. A Claude Code session reads CLAUDE.md automatically. A Cursor session reads .cursorrules. The governance is the same; the delivery format adapts to the agent.

The key mechanism

Bootstrap prompt compilation

This is the core of ExoProtocol. When exo session-start runs, it compiles the entire governance context into a single bootstrap prompt that the agent receives as its first input. The agent doesn't need to know about ExoProtocol's internals — it just reads the rules and follows them.

Here's what a real bootstrap prompt looks like:

┌──────────────────────────────────────────────────────────┐
│  GOVERNED SESSION                                        │
│  ticket: TICKET-042   actor: agent:claude                │
│  branch: feature/auth  model: claude-code                │
└──────────────────────────────────────────────────────────┘

## Governance Rules
- RULE-001: No direct commits to main
- RULE-002: Tests must pass before marking review
- RULE-SEC-001: Never modify .env, *.pem, credentials.*

## Scope
- allow: ["src/auth/**", "tests/test_auth*"]
- deny: [".exo/**", "*.lock"]

## Budgets
- max_files_changed: 8

## Checks
- ["python -m pytest tests/test_auth.py -v"]

## Git Workflow
- Before pushing, rebase: git pull --rebase origin main
- Keep commits atomic and branches short-lived

## Machine Context
- cpu_cores: 10
- load_avg_1m: 3.2
- ram: 12.4GB available / 36.0GB total

## Sibling Sessions
- agent:cursor: ticket=TICKET-038 branch=feature/api (age=1.2h)

## Start Advisories
- [warning] scope_conflict: agent:cursor (TICKET-038) has
  overlapping scope ["src/auth/middleware.py"]
- [info] unmerged_work: feature/api-v2 has 3 unmerged commits
  touching src/auth/

## Operational Learnings
- Pattern: test imports fail silently when conftest.py missing
  Insight: Always verify conftest.py exists before adding tests
- Pattern: auth middleware changes break session middleware
  Insight: Run full test suite, not just auth tests

Every section is conditional. No sibling sessions? The section doesn't appear. No scope conflicts? No advisories. Clean machine? No machine context warnings. The bootstrap prompt is as short as it can be while containing everything the agent needs.

The banner at the top is a box-drawn governance strip that the agent sees first. It's the visual anchor: "you are in a governed session, here's what you're working on."

Conflict detection

Session-start intelligence

Before the agent writes a line of code, ExoProtocol runs six detection passes. All are advisory — they never block session start — but they surface information that prevents the most common multi-agent failures.

1. Scope conflict detection

Each ticket has an allow/deny glob pattern defining which files it can touch. When a session starts, ExoProtocol loads every active sibling session's ticket and cross-checks the scope patterns using fnmatch. If two agents have overlapping file scope, a warning fires.

Key design decision: if both tickets have the default scope (["**"]), no warning fires. This avoids noise when nobody has customized scope. But the moment one ticket has a specific scope pattern, the warning activates — that's the point of setting scope.

2. Unmerged work detection

ExoProtocol reads the session index (a JSONL file tracking all completed sessions), filters to recent sessions on other branches, and checks which branches have been merged into the current branch via git branch --merged. If an unmerged branch has sessions with overlapping scope, the agent is warned: "there's relevant work on feature/api-v2 that you haven't merged."

3. Ticket contention

Checks if any active sibling session is working on the same ticket. Simple but catches the "two agents assigned the same task" failure.

4. Branch mismatch

Scans session history for prior sessions on the same ticket. If the most recent session was on a different branch, the agent is warned: "TICKET-042 was previously worked on feature/auth-v1, but you're on feature/auth-v2."

5. Base divergence

Uses git rev-list --left-right --count to check how far the feature branch has fallen behind the base branch (e.g. main). Only fires above a configurable threshold (default: 15 commits behind) to avoid noise on normal development lag.

6. Machine load

Reads CPU count, load average, and available RAM. Warns if the machine is under heavy load while multiple agent sessions are active. Tickets can carry a resource_profile (light, default, heavy) that gates whether concurrent sessions should proceed.

Session finish

Drift detection

When a session finishes, ExoProtocol scores how well the work stayed within the ticket's constraints. The drift score is a weighted composite of three factors:

drift_score = (
    0.50 * scope_violation_ratio    # files touched outside allow/deny
  + 0.35 * file_budget_ratio        # files_changed / max_files_changed
  + 0.15 * boundary_violations      # 1.0 if any boundary breaches, else 0.0
)

# Each ratio is clamped to [0.0, 1.0]
# Final score: 0.0 = perfect compliance, 1.0 = complete drift

A drift score of 0.0 means perfect compliance. A score above the threshold (default: 0.7) triggers a warning in the PR governance check. The score is recorded in the session memento and the session index for historical tracking.

Drift detection is advisory. It never blocks session-finish. This is a deliberate design choice: you don't want an agent stuck in a limbo state because drift detection crashed or returned an unexpected result. The worst case is a session finishes without a drift score, which is better than a session that never finishes.

Knowledge transfer

Mementos: how agents learn from each other

When a session finishes, ExoProtocol writes a closeout memento — a structured record of what happened, what was learned, and what the drift score was. Here's what one looks like:

# Session Memento — TICKET-042
# agent:claude | 2025-01-15T14:30:00Z | 47min

## Summary
Implemented JWT refresh token rotation for auth middleware.
Added 12 tests covering token expiry, rotation, and revocation.

## Artifacts
- src/auth/refresh.py (new, 142 loc)
- src/auth/middleware.py (modified, +28 -4 loc)
- tests/test_auth_refresh.py (new, 186 loc)

## Drift Score: 0.10
- scope: 0.0 (all files within allow pattern)
- files: 0.29 (3/8 budget used → normalized)
- boundary: 0.0

## Operational Learnings
- refresh token rotation requires atomic DB write;
  use transaction wrapper, not sequential updates
- middleware ordering matters: refresh check must run
  BEFORE session validation

## Feature Trace
- @feature:auth-refresh — active, 3 tagged files
- No violations

## Status: review

Mementos serve two purposes. First, they're the operational learnings pipeline: the next agent session on the same ticket (or overlapping scope) inherits the insights. "Refresh token rotation requires atomic DB writes" is exactly the kind of context that prevents the next agent from making the same mistake.

Second, they're the audit trail. When you run exo pr-check, it matches git commits to governed sessions by timestamp windows. Commits that don't fall within any governed session are flagged as "ungoverned" — the PR governance check reports them, and an audit session will highlight them for review.

Multi-agent coordination

Agent handoff protocol

When Agent A finishes its part of the work and Agent B needs to continue on the same ticket, session-handoff provides a governed transfer. It's atomic: finish the session, write a handoff record, release the lock — all in one operation.

The handoff record lives at .exo/cache/sessions/handoff-{ticket_id}.json and contains the source actor, target actor, work summary, reason for handoff, next steps, scope constraints, and source branch. When Agent B starts its session on the same ticket, ExoProtocol detects the pending handoff and injects a Handoff Context section into the bootstrap prompt.

Key design decision: handoff = finish + record, not suspend. A handoff finishes Agent A's session completely (with memento), rather than suspending it. This means clean state: no dangling suspended sessions, no lock ownership confusion. The handoff record is consumed on Agent B's start — it's one-shot, preventing stale context from accumulating.

The to_actor field is advisory, not enforced. If agent:claude-sonnet was specified but agent:cursor picks up instead, it still works. Governance doesn't prescribe who continues — it transfers the context to whoever starts next.

Native SDKs

SDK integrations: governance without CLI calls

Not every agent framework uses CLI commands. ExoProtocol provides native hooks that wrap the session lifecycle around agent runs in their native execution model.

OpenAI Agents SDK

ExoRunHooks is an async hook class that plugs into Runner.run(agent, hooks=...). It maps the OpenAI Agents SDK lifecycle to ExoProtocol sessions:

  • on_agent_start — starts a governed session (or reuses an active one)
  • on_agent_end — finishes the session with a summary including tool call count
  • on_tool_start — records each tool invocation for the audit trail
  • on_handoff — logs agent-to-agent handoffs

All governance operations are wrapped in try/except — a governance failure never crashes the agent run. The module uses lazy imports so it can be imported without the OpenAI Agents SDK installed. Install with pip install exoprotocol[openai-agents].

Claude Code hooks

Claude Code lifecycle hooks auto-start and auto-finish governed sessions on SessionStart and SessionEnd events. Install with exo hook-install. No code changes needed — governance wraps around the existing workflow.

Tool auto-discovery

discover_tools() scans importlib.metadata entry points under the exoprotocol.integrations group, plus core CLI and MCP tools. Install extras to register integrations automatically.

CI integration

PR governance check

exo pr-check answers a simple question: was every commit in this PR made during a governed session?

It matches commits to sessions by timestamp windows, resolves intent roots for full traceability (intent → ticket → session → commit), checks scope violations (did the session touch files outside its ticket scope?), evaluates drift scores, and verifies governance integrity (has the constitution been tampered with?).

The verdict is pass, warn, or fail:

  • fail — ungoverned commits, failed governance verification, or governance drift detected
  • warn — high drift scores or scope violations (work happened but drifted from the plan)
  • pass — all commits governed, scope clean, drift within threshold

exo adapter-generate --target ci generates a GitHub Actions workflow that runs this check on every PR automatically.

The lazy auditor defense

Audit sessions

A regular session writes code. An audit session reviews it. The problem: if the same model that wrote the code also reviews it, it's biased toward approving its own work.

exo session-audit starts a session with three differences:

  • Context isolation — the audit session can't read .exo/cache/ or .exo/memory/. It can't see prior mementos. It reviews the code fresh, without the writer's context influencing its judgment.
  • Adversarial persona — the bootstrap prompt includes Red Team Auditor directives. The agent is told to look for flaws, not confirm correctness. Custom personas can be defined in .exo/audit_persona.md.
  • Model mismatch detection — if the audit session uses the same model as the writing session, a warning fires. The whole point is independent review.

For PR reviews, pass --pr-base main --pr-head HEAD and the audit session automatically runs pr-check and injects the governance report into the bootstrap. The auditing agent sees ungoverned commits, scope violations, and drift scores before it starts reviewing code.

Code governance

Feature manifest and requirement tracing

Two YAML files control what code can exist and what the system must do:

.exo/features.yaml declares features with a lifecycle: active → experimental → deprecated → deleted. Code is tagged with @feature: / @endfeature annotations. exo trace scans the codebase and reports violations: code tagged with a deleted feature, code tagged with an unknown feature, edits to locked features. exo prune auto-removes code blocks tagged with deleted features.

.exo/requirements.yaml tracks requirements with @req: / @implements: annotations. exo trace-reqs finds orphan references (code referencing a requirement that doesn't exist), deleted references, and uncovered requirements (requirements with no implementing code).

Acceptance criteria test tracing

Requirements can declare acceptance criteria — named conditions that must be verified by tests. Each requirement lists its ACC IDs in the manifest. Test files use @acc: ACC-XXX annotations to declare which criteria they verify. exo trace-reqs --check-tests cross-references the manifest against test annotations and reports untested criteria (defined but no test) and orphan annotations (test references a criteria that doesn't exist). This closes the spec-to-test gap: every requirement has acceptance criteria, every criteria has a test, and the chain is machine-verified.

All three tracing systems — features, requirements, and acceptance criteria — are deterministic. Regex-based, no LLM involved. They run at session-finish as advisory checks and feed into the drift detection pipeline.

Telemetry

Observability: metrics, fleet drift, and traces

Governance without measurement is trust without verification. ExoProtocol produces structured telemetry from every session.

Governance metrics

exo metrics computes aggregate statistics from the session index: verification pass rate (passed / total with verify data), drift distribution (low < 0.3, medium 0.3–0.7, high ≥ 0.7), ticket throughput (unique ticket IDs), actor breakdown (sessions per agent), and mode counts (work vs audit). Returns dashboard-ready JSON.

Fleet drift

exo fleet-drift aggregates drift across all active, suspended, and recent finished sessions. For multi-agent teams, this answers "which agents are drifting right now?" It surfaces per-agent drift scores, stale sessions (dead PID or >48h), and fleet-level averages. High-drift agents (>0.7) are flagged for attention.

OTel-compatible traces

exo export-traces converts the session index into OpenTelemetry-compatible JSONL spans. Each session becomes a span with deterministic traceId (32-char hex, SHA256 of session ID) and spanId (16-char hex), nanosecond timestamps, exo.* namespaced attributes (actor, vendor, model, ticket, drift score), and events for drift checks and feature traces. Output at .exo/logs/traces.jsonl can be ingested by Jaeger, Grafana Tempo, or any OTel backend.

Sandbox policy

exo sandbox-policy derives Claude Code sandbox permissions from constitution deny rules. It maps read actions to Read(pattern), write to Edit(pattern), and delete to Edit(pattern) + Bash(rm pattern). This gives you a preview of what your governance actually enforces at the tool level.

CI integration

CI failure auto-fix

When CI fails, exo ci-fix fetches the failed run logs via the gh CLI, parses errors into structured entries, suggests fixes, and can auto-apply and push them to retrigger the pipeline.

The error parser handles ruff format violations (auto-fixable), ruff lint errors, pytest failures, and Python compile errors (SyntaxError, IndentationError). Each error is classified with a tool name, severity, and auto-fixability flag.

exo ci-fix --apply --push runs the full loop: fetch the latest failed run, apply auto-fixable commands (like ruff format), commit the changes, and push. Manual fixes are reported but left for the agent or developer to address.

True enforcement

Sealed policy + self-healing hooks

Governance state is scattered across YAML, Markdown, and JSON files. exo compose compiles everything — constitution lock, config, features manifest, requirements registry — into a single sealed policy artifact (.exo/policy.sealed.json) with a SHA-256 integrity hash.

The sealed policy is the canonical truth. Session-start auto-recomposes if sources have changed. Session-finish verifies the seal, records the result in the memento and session index, then recomposes for the next session. exo check runs all governance subsystems — integrity, sealed policy, features, requirements, coherence — in a single pass.

Self-healing hooks

Claude Code hooks are installed from the sealed policy via exo hook-install. The hooks hash is stored in the sealed artifact. On session-start, ExoProtocol compares the current hooks hash against the sealed policy. If they don't match — tamper detected — the hooks are automatically reinstalled, the tamper event is logged to .exo/audit/tamper.jsonl, and the sealed policy is recomposed. No manual intervention needed.

Scope-gated enforcement

A PreToolUse hook on Write/Edit checks the target file path against the active session's ticket scope and the global deny patterns from the sealed policy. If the file is outside scope, the tool call is blocked with exit code 2. This is real enforcement — not advisory. The agent physically cannot write to files outside its governed scope.

A git pre-commit hook runs exo check before every commit, with a python3 -m exo.cli fallback for environments where the exo CLI isn't on PATH.

Storage architecture

.exo/ — the governance namespace

Everything ExoProtocol needs lives in a single directory: .exo/. This is the governance namespace — the filesystem equivalent of a kernel's /proc. It holds the constitution, compiled lock, config, tickets, feature manifest, requirement registry, sealed policy, mementos, reflections, and session state.

Not all of it belongs in git. .exo/ has two classes of files: durable governance state (constitution, config, lock, tickets, features, requirements) that must be committed so CI, agents, and teammates all see the same rules — and ephemeral runtime state (cache, logs, locks, active sessions) that's local to one machine and excluded via .exo/.gitignore.

exo install handles this automatically: it creates the gitignore, stages the durable files, and commits them. exo doctor checks tracking status — a failing "governance_tracked" section means your governance is invisible to CI and other agents. Session-start injects a bootstrap warning when it detects untracked governance.

The sidecar worktree pattern

For teams that want full separation between app history and governance history, exo sidecar-init mounts .exo/ as a dedicated git worktree on an orphan branch (e.g., exo-governance). This gives you dual timelines: app code evolves on main, governance state evolves on its own branch.

Auto-commits fire at every lifecycle boundary — session-start, session-finish, suspend, resume — so the governance branch stays current without polluting your app commit history. Each commit uses a structured message like chore(exo): session-finish SES-xxx [TICKET-yyy]. Auto-commit is advisory — failures never block the lifecycle event.

The sidecar pattern matters most for teams with multiple agents. Without it, governance changes (ticket status updates, new mementos, drift scores) show up as noise in your app PR diffs. With the sidecar, governance has its own git history that can be reviewed, merged, and branched independently.

Design boundaries

What ExoProtocol deliberately doesn't do

Every governance system has to decide where to stop. Here's where ExoProtocol draws the line:

  • Minimal runtime enforcement. ExoProtocol's scope-gated hooks can block Write/Edit calls to out-of-scope files, but this is opt-in and limited to agents that support hooks (Claude Code). For other agents, enforcement is social (via bootstrap rules) and auditable (via drift detection). The system is designed to work even when mechanical enforcement isn't available.
  • No LLM in the loop. Every detection system — drift scoring, feature tracing, requirement tracing, scope conflict detection — is deterministic. Regex, fnmatch, hash comparisons. No model calls, no embeddings, no semantic analysis. This makes the system fast, reproducible, and auditable.
  • No external services. Everything lives in .exo/ inside your git repo. No databases, no accounts, no SaaS dependencies. pip install and you're running. Git is the storage layer and the transport layer.
  • Advisory by default. Most operations in ExoProtocol are advisory — drift detection, feature tracing, session-start advisories, audit warnings. The exceptions are scope-gated hooks (which block) and pre-commit checks (which can reject commits). A crashed detection pass never prevents an agent from finishing its work.

Try it

pip install exoprotocol && exo install

GitHub · Home