Building a Beads-Native AI Development Workflow
I’ve been building a development workflow on top of Claude Code. The problem that drove it: LLM coding assistants are stateless between sessions. You run an exploration, close the terminal, and the plan is gone. Context scatters across random markdown files. Work items live in your head or in a dozen places that don’t talk to each other.
I wanted persistent, structured, git-tracked state as the foundation. Something that travels with the repo, survives session boundaries, and can drive agent coordination.
What I ended up building: a set of custom Claude Code skills that orchestrate parallel agent swarms, all backed by beads — a git-native issue tracker that stores everything in the repo itself.
The Skills Workflow
The core loop is four commands:
/explore "topic"— spawns a research agent, stores phase-structured findings in a beads issue’s design field/prepare— parses those findings into an epic with child tasks and a dependency DAG/implement <epic>— detects the epic, spawns a Claude team, executes tasks in parallel waves/review— code review that produces phase-structured findings, feeding back into/prepare
These chain naturally depending on what you’re doing:
# New features
explore → prepare → implement
# Polish and fixes
review → prepare → implement
# Bug triage
debug → fix → prepare → implement
# Ship it
commit (auto-closes completed epics) → submit (PR via Graphite)
The remaining skills fill in the gaps. /start creates a branch
and links it to a bead. /gt wraps Graphite commands.
/resume-work recovers PR context after breaks — it reads the
bead state and gives you a summary of where things stand.
/writing-skills scaffolds new skills when I want to add
another command to the toolkit.
What I like about this setup is that each skill is small and
focused. They don’t try to do everything. /explore doesn’t
implement. /implement doesn’t plan. The boundaries are sharp,
and the data flows through beads.
Beads as Source of Truth
The guiding principle I landed on: “All plans, notes, and state live in beads — no filesystem documents.”
Before beads, I had a .jim/ directory. Plans in
.jim/plans/*.md, notes in .jim/notes/, state in
.jim/states/. It worked, barely. Nothing linked to anything
else. Syncing state between files was manual. Finding what you
needed meant grepping across directories.
Beads collapses all of that into .beads/issues.jsonl — a
single JSONL file, tracked by git, managed by one CLI (bd),
with a natural PR workflow since the data diffs like any other
code change.
The key fields I use:
- design: exploration plans, implementation strategies —
the output of
/explorelives here - notes: investigation findings, review summaries
- description: requirements with mandatory acceptance
criteria (enforced by
bd lint— no vague issues allowed) - status:
open→in_progress→closed
The migration from .jim/ to beads happened in one pivotal
commit. Four phases: explore skill, prepare skill,
review/implement, and finally a CLAUDE.md cleanup. The .jim/
directory got archived as read-only legacy.
The benefits showed up immediately:
- Resumability:
bd list --status=in_progresstells you exactly where you left off - Atomicity:
bd update --claimprevents race conditions when multiple agents grab tasks in a swarm - Auditability: JSONL diffs in git mean you can see exactly when a task changed status, who claimed it, what the plan was
- Portability: beads data travels with the repo — clone it, and you have the full project history
Teams and Swarms
This is where it gets interesting. The /implement skill can
detect when a beads issue is an epic (has children with a
dependency DAG) and automatically spawn a Claude team for
parallel execution.
The architecture has clear layers:
- Team lead: the implement skill itself, orchestrating everything
- Workers: ephemeral Task agents, fresh per task, no reuse between waves
- Beads: source of truth for the DAG, state, descriptions
- Claude teams: execution layer only — they don’t own state
Execution happens in waves:
# 1. Validate the dependency graph
bd swarm validate
# 2. Set up the team
# (TeamCreate in the skill)
# 3. Loop until done:
bd ready # which tasks have no unmet dependencies?
# spawn ALL ready workers in parallel
# wait for completion messages
bd swarm status # verify state
# next wave
# 4. Cleanup
# auto-close eligible epics
# shutdown workers, delete team
The worker protocol is simple: claim atomically with
bd update --claim, implement the task, close it with
bd close, report back to the lead, then wait for shutdown. If
the claim fails, someone else took it — stop. If there’s a file
conflict, report it to the lead instead of forcing anything.
Error handling follows graceful degradation. If some tasks in a
wave fail, the system still checks bd ready — downstream tasks
may be unblocked by the successful ones. No need to abort the
entire epic because one leaf task had trouble.
The rough edges are real, though. Worker agents sometimes produce conflicting edits when tasks touch adjacent code. The atomic claim mechanism handles task-level races, but file-level conflicts still need manual resolution. And debugging a failed swarm means reading through multiple agent outputs to piece together what happened. I’m working on better conflict detection, but it’s not solved yet.
Editor Integration with nvim-beads
One thing that bugged me about the old .jim/ setup was
visibility. Plans lived in markdown files buried in a directory
I rarely opened. I’d have to hunt down the right file to review
what Claude had generated before deciding whether to act on it.
With beads, the plans live in the design field of each issue —
but I was still running bd show commands to read them.
nvim-beads closes
that gap. The plugin surfaces beads issues natively in neovim:
:Beads to browse tasks, filter by status or priority, and
— critically — dig into the design field of any ticket to see
the full exploration plan.
That last part is what matters most to me. I spend a lot of time
reviewing the plans Claude generates. Before calling /prepare
to turn a plan into actionable tasks, I want to read the design
field, critique the approach, push back on phases that don’t make
sense. nvim-beads gives me that without leaving the editor or
memorizing bead IDs.
I’m still early in the integration — looking forward to seeing how it changes things day-to-day. But the direction is clear: the boundary between writing code and reviewing plans should not exist.
Philosophy and Patterns
A few patterns emerged as I built this out that I think are worth naming.
Orchestration over implementation. Skills spawn subagents —
they don’t implement directly. /explore, /review, and
/implement all delegate to Task agents. Skills coordinate;
agents execute. This keeps skills small and testable while
letting agents do the heavy lifting with full context windows.
Quality gates everywhere. bd lint runs after every
bd create. Every issue has enforced acceptance criteria.
Phase-structured findings from /explore and /review are
machine-parseable by /prepare — they’re not just freeform
notes, they’re structured data that drives the next step in the
pipeline.
Beads as compiler target. Think of it like a build system. Skills “compile” natural language plans into structured beads (an epic with children and a dependency DAG). The implement skill “executes” that structure via the swarm runtime. The intermediate representation is just JSONL in git.
Context budget discipline. The finite context window is a
first-class constraint, not an afterthought. My rules enforce
output piping (| head -20), summaries for subagents instead
of raw output, and aggressive trimming of verbose commands. When
you’re spawning multiple agents in a swarm, wasting context in
any one of them cascades into wasted tokens across all of them.
Future Directions
Where I’m heading next:
- More agent autonomy — auto-detect when an exploration is
thorough enough to stop, auto-classify review severity so
/preparecan prioritize - Better conflict detection — pre-flight file overlap analysis before spawning swarms, so workers don’t step on each other
- Iterative exploration — human-in-the-loop checkpoints
during
/explorefor longer research tasks - Cross-repo workflows — multi-repo epics for coordinating frontend and backend changes
- Richer beads queries — complex filters beyond simple status and type, maybe something like a query language for the JSONL data
Some of these are straightforward extensions. Others (cross-repo especially) will probably require rethinking how beads tracks state. We’ll see.
Wrapping Up
The key insight behind all of this: LLM coding assistants need persistent, structured, git-tracked state to coordinate complex workflows. Without it, every session starts from scratch and multi-step work falls apart.
Beads provides the state layer. Custom skills provide the workflow. Claude teams provide the parallelism. nvim-beads (eventually) provides the editor integration.
It’s still evolving. I’m iterating on these skills weekly as I find rough edges — the swarm conflict handling needs work, the explore-to-prepare handoff could be smoother, and I haven’t stress-tested cross-agent coordination on truly large epics. But the core pattern of git-based state driving agent coordination has proven solid. Every improvement to the beads data model pays dividends across every skill that reads it.
If you’re configuring Claude Code beyond the defaults, I’d encourage thinking about state first. Not “what prompts should I write” but “where does my workflow state live, and can my tools read and write it programmatically?” Once you answer that, the skills almost design themselves.