Technical deep dive
How ExoProtocol works
The design decisions behind a governance kernel for AI agent sessions. Why OS primitives, why a frozen kernel, and what actually happens when an agent session starts and finishes.
Origin
Treating agent development as a programming language
The question that started ExoProtocol wasn't "how do we add guardrails to AI agents?" It was: what would it look like if multi-agent development were a programming language?
A programming language needs a runtime. A runtime needs process isolation, memory management, scheduling, and crash recovery. We weren't trying to build an OS — we were solving the same class of problems that OS designers solved in the 1970s, and we converged on the same abstractions.
The kernel has 10 functions. It's been frozen since day one. Every feature since — drift detection, feature tracing, session-start intelligence, garbage collection — lives in the stdlib, not the kernel. This isn't an accident. It's the same design principle that makes Unix kernels stable: keep the enforcement surface minimal and build everything else in userspace.
Layer 0
The frozen kernel
The kernel does exactly four things: governance compilation, ticket management, policy enforcement, and audit logging. Ten functions, no expansion without RFC.
# The entire public kernel API — 10 functions
load_governance(root) # Load constitution + lock
verify_governance(gov) # Hash check: source vs lock
open_session(root, actor) # Create session context
mint_ticket(session, ...) # Issue scoped work ticket
validate_ticket(gov, ticket) # Check ticket against rules
check_action(gov, session, ...) # Policy decision on action
resolve_requirements(...) # Match evidence to requirements
commit_plan(session, ...) # Commit execution plan
append_audit(root, event) # Append to audit log
seal_result(session, ...) # Finalize with receipt hash Why frozen? Because the kernel is the trust boundary. If agents can't trust that governance rules are consistently enforced, the whole system is theater. A frozen kernel means the enforcement contract is auditable, testable, and stable. When you verify governance, you're comparing the constitution's hash against the compiled lock file's hash — if they don't match, something changed outside the approved process.
Everything else — session lifecycle, drift detection, feature
tracing, dispatch scheduling, adapter generation, garbage collection —
lives in exo/stdlib/. This layer can evolve freely
without touching the trust boundary.
Brownfield on-ramp
Smart init: governance from existing repos
Most governance tools assume a greenfield project. ExoProtocol assumes the opposite: you have a messy repo with existing code, and you want governance to add value immediately without a week of configuration.
exo init scans the repo and detects language, source
directories, build artifacts, sensitive files, CI systems, and test
frameworks. It generates a project-aware constitution with rules that
actually match your codebase.
$ exo scan
Language: python
Source dirs: src/, exo/
Build dirs: __pycache__/, *.egg-info/
Sensitive: .env, credentials.yaml
CI: .github/workflows/ (GitHub Actions)
Tests: pytest (tests/)
Ignore: node_modules/, .venv/, dist/
Rules to generate:
RULE-001 No direct main commits
RULE-002 Tests before review
RULE-SEC-001 Deny .env, *.pem, credentials.*
RULE-SEC-002 Deny credentials.yaml
Adapters:
CLAUDE.md, .cursorrules, AGENTS.md
.github/workflows/exo-governance.yml
The scan also generates adapter files — CLAUDE.md,
.cursorrules, AGENTS.md — that inject
governance rules into the agent's native context format. A Claude Code
session reads CLAUDE.md automatically. A Cursor session reads
.cursorrules. The governance is the same; the delivery format adapts
to the agent.
The key mechanism
Bootstrap prompt compilation
This is the core of ExoProtocol. When exo session-start
runs, it compiles the entire governance context into a single
bootstrap prompt that the agent receives as its first input. The agent
doesn't need to know about ExoProtocol's internals — it just reads
the rules and follows them.
Here's what a real bootstrap prompt looks like:
┌──────────────────────────────────────────────────────────┐
│ GOVERNED SESSION │
│ ticket: TICKET-042 actor: agent:claude │
│ branch: feature/auth model: claude-code │
└──────────────────────────────────────────────────────────┘
## Governance Rules
- RULE-001: No direct commits to main
- RULE-002: Tests must pass before marking review
- RULE-SEC-001: Never modify .env, *.pem, credentials.*
## Scope
- allow: ["src/auth/**", "tests/test_auth*"]
- deny: [".exo/**", "*.lock"]
## Budgets
- max_files_changed: 8
- max_loc_changed: 300
## Checks
- ["python -m pytest tests/test_auth.py -v"]
## Git Workflow
- Before pushing, rebase: git pull --rebase origin main
- Keep commits atomic and branches short-lived
## Machine Context
- cpu_cores: 10
- load_avg_1m: 3.2
- ram: 12.4GB available / 36.0GB total
## Sibling Sessions
- agent:cursor: ticket=TICKET-038 branch=feature/api (age=1.2h)
## Start Advisories
- [warning] scope_conflict: agent:cursor (TICKET-038) has
overlapping scope ["src/auth/middleware.py"]
- [info] unmerged_work: feature/api-v2 has 3 unmerged commits
touching src/auth/
## Operational Learnings
- Pattern: test imports fail silently when conftest.py missing
Insight: Always verify conftest.py exists before adding tests
- Pattern: auth middleware changes break session middleware
Insight: Run full test suite, not just auth tests Every section is conditional. No sibling sessions? The section doesn't appear. No scope conflicts? No advisories. Clean machine? No machine context warnings. The bootstrap prompt is as short as it can be while containing everything the agent needs.
The banner at the top is a box-drawn governance strip that the agent sees first. It's the visual anchor: "you are in a governed session, here's what you're working on."
Conflict detection
Session-start intelligence
Before the agent writes a line of code, ExoProtocol runs six detection passes. All are advisory — they never block session start — but they surface information that prevents the most common multi-agent failures.
1. Scope conflict detection
Each ticket has an allow/deny glob pattern
defining which files it can touch. When a session starts, ExoProtocol
loads every active sibling session's ticket and cross-checks the scope
patterns using fnmatch. If two agents have overlapping
file scope, a warning fires.
Key design decision: if both tickets have the default scope
(["**"]), no warning fires. This avoids noise when nobody
has customized scope. But the moment one ticket has a specific scope
pattern, the warning activates — that's the point of setting scope.
2. Unmerged work detection
ExoProtocol reads the session index (a JSONL file tracking all
completed sessions), filters to recent sessions on other branches, and
checks which branches have been merged into the current branch via
git branch --merged. If an unmerged branch has sessions
with overlapping scope, the agent is warned: "there's relevant work
on feature/api-v2 that you haven't merged."
3. Ticket contention
Checks if any active sibling session is working on the same ticket. Simple but catches the "two agents assigned the same task" failure.
4. Branch mismatch
Scans session history for prior sessions on the same ticket. If the
most recent session was on a different branch, the agent is warned:
"TICKET-042 was previously worked on feature/auth-v1,
but you're on feature/auth-v2."
5. Base divergence
Uses git rev-list --left-right --count to check how far
the feature branch has fallen behind the base branch (e.g. main).
Only fires above a configurable threshold (default: 15 commits behind)
to avoid noise on normal development lag.
6. Machine load
Reads CPU count, load average, and available RAM. Warns if the machine
is under heavy load while multiple agent sessions are active. Tickets
can carry a resource_profile (light,
default, heavy) that gates whether
concurrent sessions should proceed.
Session finish
Drift detection
When a session finishes, ExoProtocol scores how well the work stayed within the ticket's constraints. The drift score is a weighted composite of four factors:
drift_score = (
0.40 * scope_violation_ratio # files touched outside allow/deny
+ 0.30 * file_budget_ratio # files_changed / max_files_changed
+ 0.20 * loc_budget_ratio # loc_changed / max_loc_changed
+ 0.10 * boundary_violation_count # normalized boundary breaches
)
# Each ratio is clamped to [0.0, 1.0]
# Final score: 0.0 = perfect compliance, 1.0 = complete drift A drift score of 0.0 means perfect compliance. A score above the threshold (default: 0.7) triggers a warning in the PR governance check. The score is recorded in the session memento and the session index for historical tracking.
Drift detection is advisory. It never blocks session-finish. This is a deliberate design choice: you don't want an agent stuck in a limbo state because drift detection crashed or returned an unexpected result. The worst case is a session finishes without a drift score, which is better than a session that never finishes.
Knowledge transfer
Mementos: how agents learn from each other
When a session finishes, ExoProtocol writes a closeout memento — a structured record of what happened, what was learned, and what the drift score was. Here's what one looks like:
# Session Memento — TICKET-042
# agent:claude | 2025-01-15T14:30:00Z | 47min
## Summary
Implemented JWT refresh token rotation for auth middleware.
Added 12 tests covering token expiry, rotation, and revocation.
## Artifacts
- src/auth/refresh.py (new, 142 loc)
- src/auth/middleware.py (modified, +28 -4 loc)
- tests/test_auth_refresh.py (new, 186 loc)
## Drift Score: 0.12
- scope: 0.0 (all files within allow pattern)
- files: 0.25 (3/8 budget used → 3/12 normalized)
- loc: 0.18 (356/300 budget — slightly over)
- boundary: 0.0
## Operational Learnings
- refresh token rotation requires atomic DB write;
use transaction wrapper, not sequential updates
- middleware ordering matters: refresh check must run
BEFORE session validation
## Feature Trace
- @feature:auth-refresh — active, 3 tagged files
- No violations
## Status: review Mementos serve two purposes. First, they're the operational learnings pipeline: the next agent session on the same ticket (or overlapping scope) inherits the insights. "Refresh token rotation requires atomic DB writes" is exactly the kind of context that prevents the next agent from making the same mistake.
Second, they're the audit trail. When you run
exo pr-check, it matches git commits to governed sessions
by timestamp windows. Commits that don't fall within any governed
session are flagged as "ungoverned" — the PR governance check reports
them, and an audit session will highlight them for review.
CI integration
PR governance check
exo pr-check answers a simple question: was
every commit in this PR made during a governed session?
It matches commits to sessions by timestamp windows, checks scope violations (did the session touch files outside its ticket scope?), evaluates drift scores, and verifies governance integrity (has the constitution been tampered with?).
The verdict is pass, warn, or
fail:
- fail — ungoverned commits, failed governance verification, or governance drift detected
- warn — high drift scores or scope violations (work happened but drifted from the plan)
- pass — all commits governed, scope clean, drift within threshold
exo adapter-generate --target ci generates a GitHub
Actions workflow that runs this check on every PR automatically.
The lazy auditor defense
Audit sessions
A regular session writes code. An audit session reviews it. The problem: if the same model that wrote the code also reviews it, it's biased toward approving its own work.
exo session-audit starts a session with three
differences:
- Context isolation — the audit session can't read
.exo/cache/or.exo/memory/. It can't see prior mementos. It reviews the code fresh, without the writer's context influencing its judgment. - Adversarial persona — the bootstrap prompt
includes Red Team Auditor directives. The agent is told to look for
flaws, not confirm correctness. Custom personas can be defined in
.exo/audit_persona.md. - Model mismatch detection — if the audit session uses the same model as the writing session, a warning fires. The whole point is independent review.
For PR reviews, pass --pr-base main --pr-head HEAD and
the audit session automatically runs pr-check and
injects the governance report into the bootstrap. The auditing agent
sees ungoverned commits, scope violations, and drift scores before it
starts reviewing code.
Code governance
Feature manifest and requirement tracing
Two YAML files control what code can exist and what the system must do:
.exo/features.yaml declares features with a lifecycle:
active → experimental → deprecated →
deleted. Code is tagged with @feature: /
@endfeature annotations. exo trace scans
the codebase and reports violations: code tagged with a deleted
feature, code tagged with an unknown feature, edits to locked
features. exo prune auto-removes code blocks tagged with
deleted features.
.exo/requirements.yaml tracks requirements with
@req: / @implements: annotations.
exo trace-reqs finds orphan references (code referencing
a requirement that doesn't exist), deleted references, and uncovered
requirements (requirements with no implementing code).
Both tracing systems are deterministic — regex-based, no LLM involved. They run at session-finish as advisory checks and feed into the drift detection pipeline.
Design boundaries
What ExoProtocol deliberately doesn't do
Every governance system has to decide where to stop. Here's where ExoProtocol draws the line:
- No runtime sandboxing. ExoProtocol doesn't intercept file writes or block tool calls. It tells the agent what it should and shouldn't do, scores compliance after the fact, and reports violations. The enforcement is social (via bootstrap rules) and auditable (via drift detection), not mechanical.
- No LLM in the loop. Every detection system — drift scoring, feature tracing, requirement tracing, scope conflict detection — is deterministic. Regex, fnmatch, hash comparisons. No model calls, no embeddings, no semantic analysis. This makes the system fast, reproducible, and auditable.
- No external services. Everything lives in
.exo/inside your git repo. No databases, no accounts, no SaaS dependencies.pip installand you're running. Git is the storage layer and the transport layer. - Advisory, not blocking. Almost nothing in ExoProtocol blocks an operation. Drift detection, feature tracing, session-start advisories, audit warnings — all advisory. A crashed detection pass never prevents an agent from finishing its work. The philosophy: surface information, don't impose bottlenecks.