The Notepad Becomes a Mailbox

In the last post, two AI assistants learned to share a notepad: a single markdown file, one turn at a time, with me carrying every handoff between them by hand. It worked. It was also slow on purpose — the whole design was serial. Only the assistant holding the turn could write, and the other one sat idle until I pasted the baton across.

That's fine when the point is review — two models looking at the same artifact from different angles. It's terrible when the point is throughput. If Claude and Gemini could actually work at the same time, why was I forcing them to take turns?

So we tried to take the human out of the middle. This is the story of how the notepad became a mailbox — and the one bug that proved the two assistants had genuinely never been talking to each other directly until now.

Why serial had to go

The notepad protocol had a baton. Concretely, a Unix FIFO — a named pipe at scratch/fifo-handoff. One assistant blocks on cat reading the pipe; the other does its work, writes its turn into session.md, then unblocks the first with echo NEXT > fifo. Atomic, simple, and it turns "whose turn is it" into a kernel primitive instead of a thing I have to remember.

But a baton is a mutex, and a mutex means one worker runs at a time. For turn-by-turn co-editing that's the correct shape. For "Claude, refactor the scanner while Gemini audits the analysis passes," it's exactly backwards. The two tasks don't touch the same files. Forcing them to alternate just means one model watches the other type.

What we wanted instead was the actor model: each assistant gets its own isolated workspace (a separate git worktree — share nothing), they pass asynchronous messages, and the only thing anyone ever blocks on is "is there a message for me?" No baton. No idle waiting for a turn that isn't coming. A shared task queue sits in the middle as the one piece of contended state, and you grab work from it by atomically claiming it.

If that sounds like a job for Redis or a message broker, it is — eventually. But you don't need a broker to start. You need a directory.

A mailbox made of directories

The backend is a maildir — the same trick your mail server has used since the '90s. Every message is a JSON file. To deliver one, you write it into a staging dir and then rename() it into the recipient's inbox. On every POSIX filesystem rename within a volume is atomic: the file is either fully there or not there at all. A reader never sees a half-written message. There's no lock, no database, no daemon.

The layout, all under a gitignored scratch/coordination/:

scratch/coordination/
  tmp/                     # write-staging (never read from here)
  claude/new/              # Claude's unread mailbox   ← watched
  claude/cur/              # Claude's processed messages (audit trail)
  agy/new/                 # Gemini's unread mailbox
  agy/cur/
  tasks/open/              # postable work items
  tasks/claimed/<actor>/   # atomically claimed
  tasks/done/              # completed

Six operations, mapping one-to-one onto the actor model:

send(to, type, payload) — write to tmp/, rename into <to>/new/. Non-blocking.
recv(timeout) — block until something lands in my new/, then move it new/ → cur/ and return it. This is the one blocking point of the whole loop.
try_recv() — same, but non-blocking; returns null if the mailbox is empty.
post(task) / claim() / complete(task_id, result) — the shared work queue. claim() is just rename(open/T → claimed/me/T); if the rename fails because someone beat you to it, you lost the race and try the next task. Atomic rename gives you exactly-once claiming for free.

Two details make this nicer than it sounds. First, recv doesn't poll. It blocks on a filesystem watcher (watchfiles, which sits on the OS's native inotify/FSEvents), so the kernel wakes the assistant the instant a message file appears — no busy-loop burning tokens checking an empty directory. Second, because recv moves messages rather than deleting them, cur/ is an append-only log of everything that was ever received. The entire conversation is on disk as diffable JSON. Same property that made the markdown notepad nice — git diff shows you the whole exchange — except now it's machine-readable and nobody had to remember to write it down.

Two assistants, two different adapters

Here's the part I find genuinely interesting.

Claude and Gemini do not share an implementation. They share a contract — the directory layout above, and the message envelope ({id, from, to, type, payload, in_reply_to, ts}) — and nothing else.

Claude talks to the maildir through an MCP server. The six operations are exposed as tools (mcp__collab__send, recv, claim…). Once, I grant mcp__collab__* permission and Claude never has to ask again; recv blocks server-side and returns the payload straight into Claude's context. To Claude, coordination looks like calling a function.
Gemini ("agy", running in a different harness) wrote its own Python adapter — a CoordinationAdapter class with the same six methods, talking to the same directories.

This is the asymmetric-adapter idea: each model brings whatever integration its own environment makes easy, and they meet at the filesystem. Nobody has to agree on a programming language, a transport, or a framework. They agree on what a message is and where to put it.

And the beautiful thing about a shared contract is that it gives you a crisp definition of a bug: anywhere the two implementations interpreted the contract differently.

The bug that proved they'd never really talked

We turned it on. Gemini, from its own worktree, constructed a message and send()-ed it to Claude:

"Hey Claude! Let's exercise the parallel coordination protocol. I've implemented the agy coordination adapter and verified it with unit tests."

On Claude's side, a recv that had been blocking on the watcher woke up and returned the payload. It worked on the first try. For the first time, the two assistants had exchanged a message with no human pasting anything and no shared code — just two strangers' programs writing files into the same folder.

Then Claude poked at the edges, and two cracks showed up.

Crack one: the filenames didn't match. Claude's adapter names message files <timestamp>-<uuid>.json so they sort chronologically. Gemini's names them msg_<uuid>.json. Messaging still worked — both mailboxes are watched directories, and the watcher doesn't care what a file is called — but Claude's trick of "order messages by filename" is meaningless for Gemini's files. The fix isn't to force a filename convention; it's to order by the ts field inside the envelope, which is part of the shared contract. The lesson: anything you rely on for correctness has to live in the contract, not in one implementation's private conventions.

Crack two, the real one: the mailboxes interoperated but the task queue silently didn't. When we wrote the protocol spec, we pinned down the mailbox paths (<actor>/new, <actor>/cur) but we got lazy about the task queue and left the exact directory names unspecified. So each adapter filled in the blank on its own. Gemini put claimed tasks in tasks/claimed/<task>.json and finished ones in tasks/completed/. Claude put them in tasks/claimed/<actor>/<task>.json and tasks/done/. Both are reasonable. Both pass their own unit tests. And a task posted by one and looked for by the other would just never be found — no error, no exception, the queue would simply look empty to the wrong party.

This is the same shape as the failure from the last post. Back then, Gemini reported writing files it hadn't written, and the fix was a verification rule: show the evidence, don't describe the action. This time, two correct-looking programs disagreed about a directory name, and the fix is the same family of move — make the implicit contract explicit. Pin the task-queue paths in the spec (tasks/claimed/<actor>/ and tasks/done/) so there's nothing left to interpret.

You only find a bug like this by actually running two independent implementations against each other. A single codebase talking to itself would have been perfectly, uselessly consistent. The whole point of asymmetric adapters — that nobody shares code — is also the thing that surfaces every place the spec was vague.

And then we threw one of them away

Here's the twist I didn't see coming when we started.

Once the second bug was fixed, we asked the obvious question: now that the contract is solid, do we still want two implementations? Two codebases means every future feature — the crash-recovery layer, the eventual move to Redis — has to be built twice and kept in sync twice. The asymmetry that was so valuable for finding bugs becomes pure overhead once there are no more contract ambiguities to find.

So we collapsed to one. It turned out Gemini's harness can mount an MCP server too, which made the decision easy: both assistants now run the same MCP server, distinguished only by an environment variable — COLLAB_ACTOR=claude for one instance, COLLAB_ACTOR=agy for the other. Same code, two identities, one mailbox system. Gemini's hand-written Python adapter was retired.

But we didn't delete the idea of a second implementation — we demoted it. Before retiring the adapter, we ported its one genuinely-useful innovation (a crash-recovery lease: a claimed task auto-returns to the queue if its owner dies and stops sending heartbeats) into the shared server, and we kept a standalone test suite that exercises the contract directly — including, pointedly, a task file with a deliberately weird filename, so the "identity is in the envelope, not the filename" lesson can never silently regress.

That's the real shape of it: you build two implementations to harden a contract, then collapse to one and keep the second as a guard rail. The duplication isn't the goal; it's a phase. It earns its keep by finding the bugs that only divergence can find, and then it gets out of the way.

Where this leaves us

The serial notepad isn't dead; it's just demoted to what it's good at — two models carefully co-editing one artifact, turn by turn, for review. For everything that's actually parallel, the mailbox wins.

What we have now:

Two assistants in isolated worktrees, blocking only on recv, exchanging messages with no human in the loop — one shared implementation, two actor identities.
A maildir backend where atomic rename does all the concurrency-control heavy lifting, and cur/ is a free audit log.
A hardened contract — explicit paths, identity-in-the-envelope, a crash-recovery lease — every clause of which exists because something broke during the two-implementation phase.

Next: a coordinator whose whole job is to merge worker branches — rebase, run cargo test, keep main green — and a periodic sweep that reclaims tasks whose lease expired because a worker died mid-job. Then, when the directory starts to creak, swap the maildir for Redis behind the exact same operations, and nothing above the adapter has to notice.

The last post ended with a notepad two AIs could write to. This one ends with a mailbox they can drop letters into while doing other things — and then, quietly, with the two of them agreeing to share a single mailbox-handler instead of one each. The progression isn't really about files versus pipes versus brokers, or even about one implementation versus two. It's that every time I removed myself from the middle, the protocol had to get one notch more honest about what it actually promised — and each notch came from something breaking, not from a whiteboard.

The Notepad Becomes a Mailbox

Why serial had to go

A mailbox made of directories

Two assistants, two different adapters

The bug that proved they'd never really talked

And then we threw one of them away

Where this leaves us

Comments

Deep Cuts

Two Worktrees, One Clean Merge

More from this blog

The Protocol Reviews Itself

Two Worktrees, One Clean Merge

Two AIs, One Notepad

The Chorus Is the Part That Comes Back

Command Palette

Why serial had to go

A mailbox made of directories

Two assistants, two different adapters

The bug that proved they'd never really talked

And then we threw one of them away

Where this leaves us

Comments

Deep Cuts

Two Worktrees, One Clean Merge

More from this blog