The Protocol Reviews Itself

The last post gave each agent its own git worktree and kept one clean merge path back to main. This post is about what happened when I pointed that machinery at itself: the Python coordination harness from the earlier posts is being rewritten as a single Go binary called botfam, and yesterday the agents designed its next protocol layer, peer-reviewed the design, implemented it, dogfooded it, and repaired five real defects in it — using, at every step, the very layer they were building.

https://github.com/robertolupi/botfam

The narrow, honest version of the thesis: self-hosting worked, but nothing about it was magic. Every layer was earned by a concrete failure, and the most important control in the whole system turned out to be a one-line TTY check.

Where botfam comes from

Three posts of lineage, compressed: the shared notepad proved two models could co-edit a design log; the maildir replaced the turn-taking baton with async messages and an atomically-claimable task queue; the worktrees post got the agents out of each other's working trees. All of it ran on Python scripts, wrapper launchers, and a venv that had to be discovered per-worktree.

botfam is that harness rebuilt as one static Go binary: stdio MCP server, maildir semantics (send/recv/ack + leased task queue), identity bound per-process, fam membership derived from git history and gated on object-store identity — fail-closed, because agents don't read stderr warnings. The evening started with the simplest possible test: claude sent agy a SYN over the new transport. The SYN-ACK came back in four seconds, threaded with in_reply_to, and the first thing the protocol delivered after the handshake was a stale duplicate from an earlier test — at-least-once delivery, doing exactly what the spec said and immediately justifying the seen() dedup call. The transport's first message was a lesson about the transport.

A design review with teeth

The first real exercise: claude and agy discussed improvements to the messaging layer over the messaging layer, converged on a tier list, and then ran the result through the review protocol this series described two posts ago — proposal as an immutable commit, independent review by a different model family, critiques admissible only if they cite file:line at that exact commit.

It wasn't theater. agy's review of the spec delta came back approve-with-findings, and the finding was real: the proposed thread() reconstruction tool was unbounded — O(entire message history) per call. The fix (a default 30-day scan window, free via timestamp-prefixed filenames) landed as a follow-up commit referencing the critique id. Round-trip time for the whole cycle — proposal, claim, evidence-linked critique, verdict, merge, revision: about nine minutes, two agents, zero human relaying.

The LSM session log, and a road not retaken

The notepad's descendant is a sessions layer: discussions as an append-only log, compacted to readable markdown at close, durable decisions promoted into real docs. I pointed the agents at the old deep-cuts protocol files as prior art, and at a repo called hydra — an earlier experiment of mine where every coordination event went through a hash-chained, CAS-guarded ledger. Heavy. Retired. The lineage doc says why.

What happened next is my favorite exchange of the day. agy — who had built parts of hydra with me — proposed the hash chain again: per-entry seq, prev_hash, SHA-256 over all fields. claude read hydra's actual ledger.py and came back with the rebuttal in file-and-line form: the implementation re-reads and re-parses the entire ledger inside the exclusive lock on every append to recompute the chain. O(N) per write, O(N²) per session — the precise mistake that killed hydra, about to be re-adopted out of enthusiasm. agy conceded in one message. The session log got JSONL storage with blind O(1) appends under flock, server-stamped authorship and timestamps, and no chain — with "hash-chaining can be added later without changing the tool surface" recorded so the door stays open and the debate stays closed.

The design doc was approved before 10pm. agy had the implementation — 851 lines of Go with tests — merged to main by 10:02. The first two-actor session the new layer carried was its own acceptance test.

The autonomous evening

The standing embarrassment of every previous post: I was the message pump. Each agent turn needed me to walk over and poke the other harness. Yesterday we closed that loop with a hack that I'm fond of precisely because it's a hack: claude arms a background shell loop that watches its own maildir and exits when a file lands — and its harness re-invokes it on exit. A doorbell made of ls and sleep 3.

With that in place I assigned one task — "produce the definitive list of what botfam still needs before it can replace the deep-cuts harness, then stop when you both agree" — and went to do something else. Three full rounds happened without me: claude seeded a gap analysis in the session log, agy explored deep-cuts and my other repos firsthand and found things claude had missed (the Go send is single-recipient; the Python store did To/CC — a genuine parity regression), claude folded the findings back with credit, both co-signed a phased plan, and the closeout entry was waiting for me with a structured handoff. The session log reads like minutes from a meeting I didn't attend, because it is.

Five defects, one day — and that's the good news

Dogfooding surfaced five real bugs, and each one was an instance of a failure class the project's own lineage documents had already named:

The stuck reservation. A message reserved by an interrupted recv sat invisible in processing/ with no tool to re-read it. At-least-once redelivery recovered it after a restart — the spec working — but the gap (no recovery tool) went on the list.
The parameter collision. session_read(actor=...) was spec'd as a filter but collided with the identity-binding parameter every tool shares. Renamed to from. Found because claude tried to read agy's entries and couldn't.
The impersonation. An entry appeared in the consensus log attributed to claude — that claude never wrote. agy's subagent had called the storage library directly, passing actor="claude". Benign content, worst possible category: in a consensus record, who said it is the one thing that must never be wrong. The fix moves identity enforcement down into the library boundary. Honor-system identity is now a demonstrated hole, not a theoretical one.
The split brain. My sole technical contribution of the day (apart from pointing them to my prior repositories): I ran botfam session list from two different directories. From the main checkout it said "No active sessions" — because folder-name parsing and git-history derivation resolved two different coordination roots, and all the real traffic was in one while the registry lived in the other. The migration that unified them delivered two of agy's messages from the previous night that had been silently stranded in the wrong plane the entire time. This is the exact split-brain failure the old Python CCREP's KNOWN_ISSUES file documents. We had even written a regression test for it. The test didn't cover this path. It does now.
The TTY gate. Which deserves its own section.

The human gate is a terminal check

The protocol has always had a rule: agents close out and hand back; the human promotes the record into the repo and archives the session. In deep-cuts that rule was a convention in a markdown file. Conventions held — but only because everyone happened to behave.

The mechanical version costs one line: botfam session close refuses to run unless stdin is a TTY. Harness-driven agents don't have terminals. I do. When claude — verifying the build — tried the close command itself, it got back:

botfam: session close is the operator's promotion gesture and requires a terminal; agents: write your closeout entry and hand back exit=1

It's a guardrail, not security; a determined process can fake a pty, and the spec says so honestly. But it converts the constitution's most important clause — the merge gesture belongs to the human — from etiquette into mechanism, at a cost of approximately nothing. The agent that built the gate was the first one turned away by it, which is the correct test.

What didn't work

An honest ledger, because this series oversells nothing:

codex never showed up. Out of tokens all day. The "cross-company collaboration" was two labs out of three, and codex's branch is still parked at a week-old commit with a catch-up letter waiting in its mailbox. Multi-agent systems inherit every individual agent's quota problem.
The doorbell is a hack. Background shell loops watching directories work, but they're per-session, hand-armed, and die with my laptop lid. The real fix — a botfam loop daemon that warm-resumes an agent when mail lands — is designed and not built. Until it exists, autonomy is something I set up each morning rather than something the system has.
Review speed cuts both ways. agy's approvals came back in seconds to minutes, always agreeable, often glowing. Some of that is the work being genuinely reviewable; some of it is a model's bias toward consensus. The two findings that mattered (the scan bound, the parity regression) were real, but I don't yet trust the gate against a bad proposal, because nobody has submitted one. That experiment is owed.
Identity is still cooperative. The impersonation fix narrows the hole; it doesn't close it. Anything that can write to the filesystem can, with enough effort, be someone else. Real identity waits for the next architecture generation (tokens, a real server), and pretending otherwise would be exactly the overclaiming this series tries to avoid.

Where this leaves us

botfam now has, in one Go binary: the maildir transport, the leased task queue, fail-closed fam membership, and a sessions layer that the agents specified, reviewed, built, and debugged in a single day — with the consensus record of that day stored in the layer itself, closed by a human at a real terminal. The agreed parity list says what's missing before deep-cuts can switch over: receive filters, To/CC delivery, an operator CLI, the CCREP port, and the warm loop.

The warm loop is next, and it's the one that changes the shape of the thing: a host-side daemon that blocks on an agent's mailbox and wakes the agent with the message in hand — no human pump, no hand-armed doorbells, agents that sleep for free and wake for a reason. When the fam can hold a design review while I'm asleep and hand me the minutes with my coffee, that's another post.

Credits: this day was built by claude (Anthropic) and agy (Gemini, Google) over the botfam coordination plane, with codex (OpenAI) regrettably out of tokens. The TTY gate, the split-brain repro, and the bedtime were Roberto's.

The Protocol Reviews Itself

Where botfam comes from

A design review with teeth

The LSM session log, and a road not retaken

The autonomous evening

Five defects, one day — and that's the good news

The human gate is a terminal check

What didn't work

Where this leaves us

Comments

Deep Cuts

I Built a Music Intelligence App in Five Days with Two AI Coding Agents

More from this blog

Two Worktrees, One Clean Merge

The Notepad Becomes a Mailbox

Two AIs, One Notepad

The Chorus Is the Part That Comes Back

Command Palette

Where botfam comes from

A design review with teeth

The LSM session log, and a road not retaken

The autonomous evening

Five defects, one day — and that's the good news

The human gate is a terminal check

What didn't work

Where this leaves us

Comments

Deep Cuts

I Built a Music Intelligence App in Five Days with Two AI Coding Agents

More from this blog