Prework

Part of the methodology series.

Coding agents are fast on greenfield and sloppy on brownfield. Give one a blank repo and it ships. Point it at a production codebase — internal monorepo, open source project, anything with existing reviewers — and the output degrades. The diff is large, the rationale is thin, the tests look generated. The agent doesn’t know what the reviewer needs to see to say yes. It optimizes for “code that works,” not “code that gets merged.”

The bottleneck in a collaborative codebase is never the code generation. It’s the review — a human deciding whether to trust your contribution. Prework is what closes that gap.

Predicate vs. transformation

Every contribution to an existing codebase answers two questions. Does this approach work? And does this code implement it correctly? The first is a claim about the world — call it the predicate. The second is a mechanical task — call it the transformation. They need different evidence, different tools, different iteration speeds.

I needed to know whether union-find compaction preserves more detail than flat summarization for chat history. The production repo had no cheap way to compare variants — no fixtures, no evaluation harness, no controlled conditions. So the predicate lived in a prototype repo: synthetic conversations, its own runner, seven trials. The transformation (porting the algorithm to TypeScript) took ninety minutes. Rebase, fix, verify, push. No design arguments during review. No “should we?” — only “does it work?”

Settle the predicate before you touch production. The transformation becomes verifiable, delegatable, and fast.

What to build

Prework is worth building when you can name the failure category. Not the specific bug — the kind of surprise.

Experiment repo. Guards against: the approach doesn’t actually help. Build it when production can’t answer cheaply — feedback loop too slow, fixtures too coupled, or evaluation needs controlled conditions CI can’t provide. The experiment repo should have its own fixtures, its own runner, and results a reviewer can audit independently.

Compatibility suite. Guards against: the port diverges from the original. Build it when the prework is in a different language or framework than the target. Import the prototype’s fixtures, run them through the production implementation, assert identical outputs. Two bugs survived three review rounds because this artifact didn’t exist.

Transformation design doc. Guards against: structural decisions made ad hoc during the port. Build it when the prototype’s architecture doesn’t map onto the target’s. It specifies which module maps where, which interfaces change shape, and which behaviors differ intentionally.

Integration manifest. Guards against: forgetting where things live. Build it when the contribution spans multiple repos, remotes, or branches. One line per artifact: repo, branch, remote, what it hosts. Free, prevents twenty-minute archaeology sessions.

What not to build

Prework against failure categories you can’t name is speculative architecture — wrong by the time the failure arrives.

Don’t pre-build defenses against unknown review feedback — that’s the reviewer’s job. Pre-building abstractions for hypothetical extensions or a compatibility suite for a one-shot script falls in the same trap. Prework that misses its target is worse than none: it creates false confidence.

The filter: can you name the failure category? If yes, is the prework artifact legible to someone other than you — a reviewer, a CI pipeline, a future agent? If both yes, build it. If you can’t name the category, skip it and learn from the failure when it comes.

Artifacts compound

Prework that stays in your head isn’t prework. It’s thinking, and thinking doesn’t compound. The artifact is what compounds: an experiment repo gets cited in a blog post, linked from a PR, referenced in review, audited in a retro. A design doc gets read by the agent doing the port. A compatibility suite catches regressions on every future change.

Artifacts that encode what’s true — experiments, specs, validated prototypes — have long half-lives and accumulate references; artifacts that encode where things are — manifests, checklists — have short half-lives and deprecate. Build the first kind when the failure category is project-killing. Build the second when rediscovery costs more than a one-line note.

The recipe

Given a spec for a brownfield contribution:

  1. Identify the predicate. What claim does this contribution make? What would falsify it?
  2. Check whether production can falsify it cheaply. If the test suite is fast, fixtures are simple, and evaluation is binary — skip the prototype, work in production.
  3. If not, build a prototype repo. Match the target’s data shapes and interfaces; skip its build system and dependencies. Own the whole loop: fixtures, runner, evaluation. Validate the predicate with evidence a reviewer can audit.
  4. Name every failure category you can foresee for the port. For each one, decide: experiment, compatibility suite, design doc, or manifest.
  5. Port. The transformation should be mechanical. If it requires design decisions, the prework missed something — go back and build the missing artifact.

The port itself is where tools like /forge and /volley operate — automated pipelines that sharpen specs, implement, verify, and clean up PRs. Those tools optimize the transformation. Prework optimizes what comes before: the evidence that makes the transformation worth reviewing at all.

The reviewer opens the diff: one commit, a link to a pre-registered experiment, eighty tests matching the prototype’s assertions. They review the code, not the premise. No back-and-forth about whether the algorithm works — that question has a URL with seven trials and a p-value.

Reviewers don’t separate “does this work?” from “should we do this?” They just feel “I can’t say yes to this.” Prework resolves the first question before they encounter the diff. Whatever doubt remains is about direction, not correctness — and now it’s small enough to name. You don’t need to categorize the uncertainty upfront. You build the artifacts, and the distinction reveals itself.

The bottleneck in a collaborative repo is rarely the build. It’s the review — a maintainer deciding whether to trust your contribution. Prework converts “trust me” into “here’s the provenance.” What lands in production is just the receipt.


Written via the double loop.