Technical deep dive

How ExoProtocol works

The design decisions behind a governance kernel for AI agent sessions. Why OS primitives, why a frozen kernel, and what actually happens when an agent session starts and finishes.

Origin

Treating agent development as a programming language

The question that started ExoProtocol wasn't "how do we add guardrails to AI agents?" It was: what would it look like if multi-agent development were a programming language?

A programming language needs a runtime. A runtime needs process isolation, memory management, scheduling, and crash recovery. We weren't trying to build an OS — we were solving the same class of problems that OS designers solved in the 1970s, and we converged on the same abstractions.

The kernel has 10 functions. It's been frozen since day one. Every feature since — drift detection, feature tracing, session-start intelligence, garbage collection — lives in the stdlib, not the kernel. This isn't an accident. It's the same design principle that makes Unix kernels stable: keep the enforcement surface minimal and build everything else in userspace.

Layer 0

The frozen kernel

The kernel does exactly four things: governance compilation, ticket management, policy enforcement, and audit logging. Ten functions, no expansion without RFC.

# The entire public kernel API — 10 functions

load_governance(root)           # Load constitution + lock
verify_governance(gov)          # Hash check: source vs lock

open_session(root, actor)       # Create session context
mint_ticket(session, ...)       # Issue scoped work ticket

validate_ticket(gov, ticket)    # Check ticket against rules
check_action(gov, session, ...) # Policy decision on action

resolve_requirements(...)       # Match evidence to requirements
commit_plan(session, ...)       # Commit execution plan

append_audit(root, event)       # Append to audit log
seal_result(session, ...)       # Finalize with receipt hash

Why frozen? Because the kernel is the trust boundary. If agents can't trust that governance rules are consistently enforced, the whole system is theater. A frozen kernel means the enforcement contract is auditable, testable, and stable. When you verify governance, you're comparing the constitution's hash against the compiled lock file's hash — if they don't match, something changed outside the approved process.

Everything else — session lifecycle, drift detection, feature tracing, dispatch scheduling, adapter generation, garbage collection — lives in exo/stdlib/. This layer can evolve freely without touching the trust boundary.

Brownfield on-ramp

Smart init: governance from existing repos

Most governance tools assume a greenfield project. ExoProtocol assumes the opposite: you have a messy repo with existing code, and you want governance to add value immediately without a week of configuration.

exo init scans the repo and detects language, source directories, build artifacts, sensitive files, CI systems, and test frameworks. It generates a project-aware constitution with rules that actually match your codebase.

$ exo scan

Language:    python
Source dirs: src/, exo/
Build dirs:  __pycache__/, *.egg-info/
Sensitive:   .env, credentials.yaml
CI:          .github/workflows/ (GitHub Actions)
Tests:       pytest (tests/)
Ignore:      node_modules/, .venv/, dist/

Rules to generate:
  RULE-001    No direct main commits
  RULE-002    Tests before review
  RULE-SEC-001 Deny .env, *.pem, credentials.*
  RULE-SEC-002 Deny credentials.yaml

Adapters:
  CLAUDE.md, .cursorrules, AGENTS.md
  .github/workflows/exo-governance.yml

The scan also generates adapter files — CLAUDE.md, .cursorrules, AGENTS.md — that inject governance rules into the agent's native context format. A Claude Code session reads CLAUDE.md automatically. A Cursor session reads .cursorrules. The governance is the same; the delivery format adapts to the agent.

The key mechanism

Bootstrap prompt compilation

This is the core of ExoProtocol. When exo session-start runs, it compiles the entire governance context into a single bootstrap prompt that the agent receives as its first input. The agent doesn't need to know about ExoProtocol's internals — it just reads the rules and follows them.

Here's what a real bootstrap prompt looks like:

┌──────────────────────────────────────────────────────────┐
│  GOVERNED SESSION                                        │
│  ticket: TICKET-042   actor: agent:claude                │
│  branch: feature/auth  model: claude-code                │
└──────────────────────────────────────────────────────────┘

## Governance Rules
- RULE-001: No direct commits to main
- RULE-002: Tests must pass before marking review
- RULE-SEC-001: Never modify .env, *.pem, credentials.*

## Scope
- allow: ["src/auth/**", "tests/test_auth*"]
- deny: [".exo/**", "*.lock"]

## Budgets
- max_files_changed: 8
- max_loc_changed: 300

## Checks
- ["python -m pytest tests/test_auth.py -v"]

## Git Workflow
- Before pushing, rebase: git pull --rebase origin main
- Keep commits atomic and branches short-lived

## Machine Context
- cpu_cores: 10
- load_avg_1m: 3.2
- ram: 12.4GB available / 36.0GB total

## Sibling Sessions
- agent:cursor: ticket=TICKET-038 branch=feature/api (age=1.2h)

## Start Advisories
- [warning] scope_conflict: agent:cursor (TICKET-038) has
  overlapping scope ["src/auth/middleware.py"]
- [info] unmerged_work: feature/api-v2 has 3 unmerged commits
  touching src/auth/

## Operational Learnings
- Pattern: test imports fail silently when conftest.py missing
  Insight: Always verify conftest.py exists before adding tests
- Pattern: auth middleware changes break session middleware
  Insight: Run full test suite, not just auth tests

Every section is conditional. No sibling sessions? The section doesn't appear. No scope conflicts? No advisories. Clean machine? No machine context warnings. The bootstrap prompt is as short as it can be while containing everything the agent needs.

The banner at the top is a box-drawn governance strip that the agent sees first. It's the visual anchor: "you are in a governed session, here's what you're working on."

Conflict detection

Session-start intelligence

Before the agent writes a line of code, ExoProtocol runs six detection passes. All are advisory — they never block session start — but they surface information that prevents the most common multi-agent failures.

1. Scope conflict detection

Each ticket has an allow/deny glob pattern defining which files it can touch. When a session starts, ExoProtocol loads every active sibling session's ticket and cross-checks the scope patterns using fnmatch. If two agents have overlapping file scope, a warning fires.

Key design decision: if both tickets have the default scope (["**"]), no warning fires. This avoids noise when nobody has customized scope. But the moment one ticket has a specific scope pattern, the warning activates — that's the point of setting scope.

2. Unmerged work detection

ExoProtocol reads the session index (a JSONL file tracking all completed sessions), filters to recent sessions on other branches, and checks which branches have been merged into the current branch via git branch --merged. If an unmerged branch has sessions with overlapping scope, the agent is warned: "there's relevant work on feature/api-v2 that you haven't merged."

3. Ticket contention

Checks if any active sibling session is working on the same ticket. Simple but catches the "two agents assigned the same task" failure.

4. Branch mismatch

Scans session history for prior sessions on the same ticket. If the most recent session was on a different branch, the agent is warned: "TICKET-042 was previously worked on feature/auth-v1, but you're on feature/auth-v2."

5. Base divergence

Uses git rev-list --left-right --count to check how far the feature branch has fallen behind the base branch (e.g. main). Only fires above a configurable threshold (default: 15 commits behind) to avoid noise on normal development lag.

6. Machine load

Reads CPU count, load average, and available RAM. Warns if the machine is under heavy load while multiple agent sessions are active. Tickets can carry a resource_profile (light, default, heavy) that gates whether concurrent sessions should proceed.

Session finish

Drift detection

When a session finishes, ExoProtocol scores how well the work stayed within the ticket's constraints. The drift score is a weighted composite of four factors:

drift_score = (
    0.40 * scope_violation_ratio    # files touched outside allow/deny
  + 0.30 * file_budget_ratio        # files_changed / max_files_changed
  + 0.20 * loc_budget_ratio         # loc_changed / max_loc_changed
  + 0.10 * boundary_violation_count # normalized boundary breaches
)

# Each ratio is clamped to [0.0, 1.0]
# Final score: 0.0 = perfect compliance, 1.0 = complete drift

A drift score of 0.0 means perfect compliance. A score above the threshold (default: 0.7) triggers a warning in the PR governance check. The score is recorded in the session memento and the session index for historical tracking.

Drift detection is advisory. It never blocks session-finish. This is a deliberate design choice: you don't want an agent stuck in a limbo state because drift detection crashed or returned an unexpected result. The worst case is a session finishes without a drift score, which is better than a session that never finishes.

Knowledge transfer

Mementos: how agents learn from each other

When a session finishes, ExoProtocol writes a closeout memento — a structured record of what happened, what was learned, and what the drift score was. Here's what one looks like:

# Session Memento — TICKET-042
# agent:claude | 2025-01-15T14:30:00Z | 47min

## Summary
Implemented JWT refresh token rotation for auth middleware.
Added 12 tests covering token expiry, rotation, and revocation.

## Artifacts
- src/auth/refresh.py (new, 142 loc)
- src/auth/middleware.py (modified, +28 -4 loc)
- tests/test_auth_refresh.py (new, 186 loc)

## Drift Score: 0.12
- scope: 0.0 (all files within allow pattern)
- files: 0.25 (3/8 budget used → 3/12 normalized)
- loc: 0.18 (356/300 budget — slightly over)
- boundary: 0.0

## Operational Learnings
- refresh token rotation requires atomic DB write;
  use transaction wrapper, not sequential updates
- middleware ordering matters: refresh check must run
  BEFORE session validation

## Feature Trace
- @feature:auth-refresh — active, 3 tagged files
- No violations

## Status: review

Mementos serve two purposes. First, they're the operational learnings pipeline: the next agent session on the same ticket (or overlapping scope) inherits the insights. "Refresh token rotation requires atomic DB writes" is exactly the kind of context that prevents the next agent from making the same mistake.

Second, they're the audit trail. When you run exo pr-check, it matches git commits to governed sessions by timestamp windows. Commits that don't fall within any governed session are flagged as "ungoverned" — the PR governance check reports them, and an audit session will highlight them for review.

CI integration

PR governance check

exo pr-check answers a simple question: was every commit in this PR made during a governed session?

It matches commits to sessions by timestamp windows, checks scope violations (did the session touch files outside its ticket scope?), evaluates drift scores, and verifies governance integrity (has the constitution been tampered with?).

The verdict is pass, warn, or fail:

  • fail — ungoverned commits, failed governance verification, or governance drift detected
  • warn — high drift scores or scope violations (work happened but drifted from the plan)
  • pass — all commits governed, scope clean, drift within threshold

exo adapter-generate --target ci generates a GitHub Actions workflow that runs this check on every PR automatically.

The lazy auditor defense

Audit sessions

A regular session writes code. An audit session reviews it. The problem: if the same model that wrote the code also reviews it, it's biased toward approving its own work.

exo session-audit starts a session with three differences:

  • Context isolation — the audit session can't read .exo/cache/ or .exo/memory/. It can't see prior mementos. It reviews the code fresh, without the writer's context influencing its judgment.
  • Adversarial persona — the bootstrap prompt includes Red Team Auditor directives. The agent is told to look for flaws, not confirm correctness. Custom personas can be defined in .exo/audit_persona.md.
  • Model mismatch detection — if the audit session uses the same model as the writing session, a warning fires. The whole point is independent review.

For PR reviews, pass --pr-base main --pr-head HEAD and the audit session automatically runs pr-check and injects the governance report into the bootstrap. The auditing agent sees ungoverned commits, scope violations, and drift scores before it starts reviewing code.

Code governance

Feature manifest and requirement tracing

Two YAML files control what code can exist and what the system must do:

.exo/features.yaml declares features with a lifecycle: active → experimental → deprecated → deleted. Code is tagged with @feature: / @endfeature annotations. exo trace scans the codebase and reports violations: code tagged with a deleted feature, code tagged with an unknown feature, edits to locked features. exo prune auto-removes code blocks tagged with deleted features.

.exo/requirements.yaml tracks requirements with @req: / @implements: annotations. exo trace-reqs finds orphan references (code referencing a requirement that doesn't exist), deleted references, and uncovered requirements (requirements with no implementing code).

Both tracing systems are deterministic — regex-based, no LLM involved. They run at session-finish as advisory checks and feed into the drift detection pipeline.

Design boundaries

What ExoProtocol deliberately doesn't do

Every governance system has to decide where to stop. Here's where ExoProtocol draws the line:

  • No runtime sandboxing. ExoProtocol doesn't intercept file writes or block tool calls. It tells the agent what it should and shouldn't do, scores compliance after the fact, and reports violations. The enforcement is social (via bootstrap rules) and auditable (via drift detection), not mechanical.
  • No LLM in the loop. Every detection system — drift scoring, feature tracing, requirement tracing, scope conflict detection — is deterministic. Regex, fnmatch, hash comparisons. No model calls, no embeddings, no semantic analysis. This makes the system fast, reproducible, and auditable.
  • No external services. Everything lives in .exo/ inside your git repo. No databases, no accounts, no SaaS dependencies. pip install and you're running. Git is the storage layer and the transport layer.
  • Advisory, not blocking. Almost nothing in ExoProtocol blocks an operation. Drift detection, feature tracing, session-start advisories, audit warnings — all advisory. A crashed detection pass never prevents an agent from finishing its work. The philosophy: surface information, don't impose bottlenecks.

Try it

pip install exoprotocol
exo init
exo doctor

GitHub · Home