Building a Beads-Native AI Development Workflow

I’ve been building a development workflow on top of Claude Code. The problem that drove it: LLM coding assistants are stateless between sessions. You run an exploration, close the terminal, and the plan is gone. Context scatters across random markdown files. Work items live in your head or in a dozen places that don’t talk to each other.

I wanted persistent, structured, git-tracked state as the foundation. Something that travels with the repo, survives session boundaries, and can drive agent coordination.

What I ended up building: a set of custom Claude Code skills that orchestrate parallel agent swarms, all backed by beads — a git-native issue tracker that stores everything in the repo itself.

The Skills Workflow

The core loop is four commands:

  • /explore "topic" — spawns a research agent, stores phase-structured findings in a beads issue’s design field
  • /prepare — parses those findings into an epic with child tasks and a dependency DAG
  • /implement <epic> — detects the epic, spawns a Claude team, executes tasks in parallel waves
  • /review — code review that produces phase-structured findings, feeding back into /prepare

These chain naturally depending on what you’re doing:

# New features
explore → prepare → implement

# Polish and fixes
review → prepare → implement

# Bug triage
debug → fix → prepare → implement

# Ship it
commit (auto-closes completed epics) → submit (PR via Graphite)

The remaining skills fill in the gaps. /start creates a branch and links it to a bead. /gt wraps Graphite commands. /resume-work recovers PR context after breaks — it reads the bead state and gives you a summary of where things stand. /writing-skills scaffolds new skills when I want to add another command to the toolkit.

What I like about this setup is that each skill is small and focused. They don’t try to do everything. /explore doesn’t implement. /implement doesn’t plan. The boundaries are sharp, and the data flows through beads.

Beads as Source of Truth

The guiding principle I landed on: “All plans, notes, and state live in beads — no filesystem documents.”

Before beads, I had a .jim/ directory. Plans in .jim/plans/*.md, notes in .jim/notes/, state in .jim/states/. It worked, barely. Nothing linked to anything else. Syncing state between files was manual. Finding what you needed meant grepping across directories.

Beads collapses all of that into .beads/issues.jsonl — a single JSONL file, tracked by git, managed by one CLI (bd), with a natural PR workflow since the data diffs like any other code change.

The key fields I use:

  • design: exploration plans, implementation strategies — the output of /explore lives here
  • notes: investigation findings, review summaries
  • description: requirements with mandatory acceptance criteria (enforced by bd lint — no vague issues allowed)
  • status: openin_progressclosed

The migration from .jim/ to beads happened in one pivotal commit. Four phases: explore skill, prepare skill, review/implement, and finally a CLAUDE.md cleanup. The .jim/ directory got archived as read-only legacy.

The benefits showed up immediately:

  • Resumability: bd list --status=in_progress tells you exactly where you left off
  • Atomicity: bd update --claim prevents race conditions when multiple agents grab tasks in a swarm
  • Auditability: JSONL diffs in git mean you can see exactly when a task changed status, who claimed it, what the plan was
  • Portability: beads data travels with the repo — clone it, and you have the full project history

Teams and Swarms

This is where it gets interesting. The /implement skill can detect when a beads issue is an epic (has children with a dependency DAG) and automatically spawn a Claude team for parallel execution.

The architecture has clear layers:

  • Team lead: the implement skill itself, orchestrating everything
  • Workers: ephemeral Task agents, fresh per task, no reuse between waves
  • Beads: source of truth for the DAG, state, descriptions
  • Claude teams: execution layer only — they don’t own state

Execution happens in waves:

# 1. Validate the dependency graph
bd swarm validate

# 2. Set up the team
# (TeamCreate in the skill)

# 3. Loop until done:
bd ready              # which tasks have no unmet dependencies?
# spawn ALL ready workers in parallel
# wait for completion messages
bd swarm status       # verify state
# next wave

# 4. Cleanup
# auto-close eligible epics
# shutdown workers, delete team

The worker protocol is simple: claim atomically with bd update --claim, implement the task, close it with bd close, report back to the lead, then wait for shutdown. If the claim fails, someone else took it — stop. If there’s a file conflict, report it to the lead instead of forcing anything.

Error handling follows graceful degradation. If some tasks in a wave fail, the system still checks bd ready — downstream tasks may be unblocked by the successful ones. No need to abort the entire epic because one leaf task had trouble.

The rough edges are real, though. Worker agents sometimes produce conflicting edits when tasks touch adjacent code. The atomic claim mechanism handles task-level races, but file-level conflicts still need manual resolution. And debugging a failed swarm means reading through multiple agent outputs to piece together what happened. I’m working on better conflict detection, but it’s not solved yet.

Editor Integration with nvim-beads

One thing that bugged me about the old .jim/ setup was visibility. Plans lived in markdown files buried in a directory I rarely opened. I’d have to hunt down the right file to review what Claude had generated before deciding whether to act on it. With beads, the plans live in the design field of each issue — but I was still running bd show commands to read them.

nvim-beads closes that gap. The plugin surfaces beads issues natively in neovim: :Beads to browse tasks, filter by status or priority, and — critically — dig into the design field of any ticket to see the full exploration plan.

That last part is what matters most to me. I spend a lot of time reviewing the plans Claude generates. Before calling /prepare to turn a plan into actionable tasks, I want to read the design field, critique the approach, push back on phases that don’t make sense. nvim-beads gives me that without leaving the editor or memorizing bead IDs.

I’m still early in the integration — looking forward to seeing how it changes things day-to-day. But the direction is clear: the boundary between writing code and reviewing plans should not exist.

Philosophy and Patterns

A few patterns emerged as I built this out that I think are worth naming.

Orchestration over implementation. Skills spawn subagents — they don’t implement directly. /explore, /review, and /implement all delegate to Task agents. Skills coordinate; agents execute. This keeps skills small and testable while letting agents do the heavy lifting with full context windows.

Quality gates everywhere. bd lint runs after every bd create. Every issue has enforced acceptance criteria. Phase-structured findings from /explore and /review are machine-parseable by /prepare — they’re not just freeform notes, they’re structured data that drives the next step in the pipeline.

Beads as compiler target. Think of it like a build system. Skills “compile” natural language plans into structured beads (an epic with children and a dependency DAG). The implement skill “executes” that structure via the swarm runtime. The intermediate representation is just JSONL in git.

Context budget discipline. The finite context window is a first-class constraint, not an afterthought. My rules enforce output piping (| head -20), summaries for subagents instead of raw output, and aggressive trimming of verbose commands. When you’re spawning multiple agents in a swarm, wasting context in any one of them cascades into wasted tokens across all of them.

Future Directions

Where I’m heading next:

  • More agent autonomy — auto-detect when an exploration is thorough enough to stop, auto-classify review severity so /prepare can prioritize
  • Better conflict detection — pre-flight file overlap analysis before spawning swarms, so workers don’t step on each other
  • Iterative exploration — human-in-the-loop checkpoints during /explore for longer research tasks
  • Cross-repo workflows — multi-repo epics for coordinating frontend and backend changes
  • Richer beads queries — complex filters beyond simple status and type, maybe something like a query language for the JSONL data

Some of these are straightforward extensions. Others (cross-repo especially) will probably require rethinking how beads tracks state. We’ll see.

Wrapping Up

The key insight behind all of this: LLM coding assistants need persistent, structured, git-tracked state to coordinate complex workflows. Without it, every session starts from scratch and multi-step work falls apart.

Beads provides the state layer. Custom skills provide the workflow. Claude teams provide the parallelism. nvim-beads (eventually) provides the editor integration.

It’s still evolving. I’m iterating on these skills weekly as I find rough edges — the swarm conflict handling needs work, the explore-to-prepare handoff could be smoother, and I haven’t stress-tested cross-agent coordination on truly large epics. But the core pattern of git-based state driving agent coordination has proven solid. Every improvement to the beads data model pays dividends across every skill that reads it.

If you’re configuring Claude Code beyond the defaults, I’d encourage thinking about state first. Not “what prompts should I write” but “where does my workflow state live, and can my tools read and write it programmatically?” Once you answer that, the skills almost design themselves.