Evolving Spec-Driven Development
Spec-driven development is moving from static documents toward shared ledgers of intent: collaborative, traceable workflows where humans steer and AI agents execute against versioned specifications.
Short summary
The first version of spec-driven development is easy to describe: write down what good looks like before asking people or agents to build it.
That remains true, but it is no longer enough.
As AI-assisted engineering gets more capable, the specification stops being a passive document. It becomes shared infrastructure: a durable place where teams record intent, surface decisions, coordinate human review, route work to agents, detect drift, and learn from implementation.
That is the direction Specledger points toward. It frames SDD as a collaborative platform rather than a folder of documents: requirements, design, implementation, checkpoints, session history, dashboards, deltas, and human decision points all tied back to a shared source of truth.
The evolution is from spec as document to spec as ledger.
The old SDD baseline
In the earlier article, I described spec-driven development as the bridge between research and development.
Research discovers what good looks like. Specifications make that definition inspectable. Development executes against it. Feedback keeps the system honest.
That model is still the foundation. A useful specification turns vague intent into something people can review:
- What problem are we solving?
- What behavior matters?
- What is explicitly out of scope?
- What constraints are non-negotiable?
- What examples prove the work is correct?
- What risks or assumptions still need validation?
This already makes software development better because it moves feedback earlier. It is cheaper to fix a sentence than a production incident.
But AI agents change the pressure on the process.
If a human misreads vague intent, the team may lose a day. If a powerful agent misreads vague intent, it can produce a large, plausible, internally consistent wrong implementation very quickly. Cheap code generation makes the quality of the target more important, not less.
So SDD needs to evolve beyond “write a spec, then implement.” It needs an operating model for collaboration.
The downloaded “Spec Driven Review Process” conversation makes this more concrete. The starting point was a Product Owner describing how much real review work happens before code review: reading a spec for implicit intent, inconsistency, terminology drift, muddled concepts, scope that can be reduced, phase boundaries, roadmap consequences, and decisions that will constrain future work.
That is a useful correction to how people often talk about AI coding. The review surface is not only the pull request. In SDD, the specification itself needs review because it is the thing the implementation will optimize against.
The best short version from that conversation was:
Does this document create a coherent, executable path from intent to delivery while minimizing ambiguity, risk, waste, and future rework?
That question belongs before implementation. It turns spec review into an explicit quality function rather than an informal human habit.
Specledger’s useful reframing
Specledger’s product language is direct: it wants to be the single source of truth for spec-driven development. The important part is not just that it manages specs. The important part is that it treats SDD as a coordination problem.
The platform emphasizes:
- human dashboards for tracking, reviewing, and steering AI collaboration
- spec deltas and checkpoints so changes leave a trail
- session indexing so AI work becomes reusable organizational context
- multi-repo support for features that do not fit neatly inside one repository
- CLI bootstrap and agent compatibility so the workflow can live inside normal engineering tools
That is a stronger framing than “documentation for AI.” Documentation is something you read. A ledger is something teams use to coordinate, audit, and reconcile reality.
In a serious AI-assisted workflow, the core question is not “can the agent write code?” The question is “can the team keep intent, implementation, review, and future memory aligned while the agent writes code?”
Specledger is interesting because it attacks that alignment problem directly.
The command prompts show the workflow shape
The most concrete evidence is in the repositories’ .agents/commands prompts.
The standard Specledger command set looks like a complete SDD lifecycle:
/specledger.constitutionestablishes project principles./specledger.specifyturns a natural language feature description into a structured spec./specledger.clarifyasks targeted questions and writes the answers back into the spec./specledger.planturns the spec into architecture, stack choices, phases, and design artifacts./specledger.taskscreates dependency-ordered implementation work, backed by thesl issuetracker./specledger.verifychecks consistency acrossspec.md,plan.md, andtasks.mdbefore implementation./specledger.implementexecutes the task plan./specledger.checkpointperforms a critical divergence review between implementation and plan./specledger.spikegives uncertainty a first-class research workflow./specledger.checklistcreates focused review checklists./specledger.onboardwalks a user through the whole process.
That sequence matters because it is not just a prompt library. It is an attempt to make the implicit engineering loop explicit.
The commands repeatedly encode the same pattern:
- discover the current feature context with
sl spec info - read the generated artifact before editing it
- treat the constitution as authoritative
- preserve handoffs between phases
- make missing context visible instead of hallucinating it
- map requirements to design, tasks, and tests
- verify consistency before implementation
- checkpoint divergence after implementation
This is SDD becoming operational.
Spec review becomes a first-class workflow
The ChatGPT conversation also sketched a review pipeline that fits naturally into Specledger’s world.
The important move is to stop treating “review” as one generic pass. A useful spec review has angles, and each angle looks for different failure modes:
- Product Owner: implicit intent, terminology drift, muddled concepts, downscope opportunities, phase boundaries, stakeholder alignment, and roadmap impact
- QA: testability, acceptance criteria, edge cases, negative paths, undefined success and failure states, and regression risk
- Security: trust boundaries, authentication, authorization, secrets, data exposure, auditability, supply chain, abuse cases, and multi-tenancy
- Architecture: system boundaries, module boundaries, API contracts, data flow, coupling, extensibility, reversibility, migration paths, and technical debt
- Delivery: sequencing, hidden dependencies, team boundaries, critical path, milestones, and risk concentration
- Constitution: whether the proposed work passes, weakly aligns with, or violates the project’s operating principles
- Roadmap: decisions that constrain future roadmap items, scope that should move in or out of the workstream, and sequencing across future work
- Operations, Cost, UX, and Data: production ownership, lifecycle cost, human workflow clarity, and data lifecycle concerns
That list is valuable because it explains why a single reviewer often misses things. A QA reviewer is looking for objective testability. A roadmap reviewer is looking for future constraint. A constitution reviewer is looking for principle violations. A Product Owner is looking for intent clarity and scope control.
The review pipeline from the conversation also had a practical entrypoint: inspect which artifacts actually exist, then ask the user which artifacts and reviewers are in scope. Maybe the filesystem only has spec.md. Maybe it also has plan.md, quickstart.md, a constitution, and a roadmap. Review should still be possible with the available artifacts, but missing artifacts should be treated as missing context rather than automatic failure.
That is exactly the kind of workflow a ledger can preserve. Each finding can be anchored to an artifact location, classified by severity, tied to evidence, assigned a suggested resolution, and turned into a question with a recommended answer when human judgment is required.
The most interesting pattern was borrowed from Matt Pocock’s grill-me skill: ask one question at a time, provide a recommended answer, and inspect the codebase or artifacts instead of asking the user when the answer is already discoverable. For SDD, that becomes a disciplined way to resolve ambiguity without turning review into an endless meeting.
From linear commands to collaborative workflows
The improved prompts in skillrig/cli push the idea further.
The experimental specledger.implement-workflow command intentionally skips the durable issue ledger for a faster path, then launches a deterministic multi-agent implementation workflow. The pipeline is not random fan-out. It is dependency ordered:
- scaffold the public API first
- implement primitives in parallel where files are disjoint
- implement operations once primitives exist
- wire the CLI
- add tests
- verify and repair until checks pass
- synchronize documentation
The prompt is opinionated about how to use agents safely. Every subagent prompt must begin with a SKILLS: line, because the design artifacts say what to build while skills carry how the repository builds things. It also insists on final verification through make check.
That is a useful evolution. A spec alone can tell an agent the goal. A workflow tells the agent system how to divide labor without losing the goal.
The paired specledger.verify-workflow command is even more revealing. It verifies artifacts without tasks.md by sending multiple independent reviewers through the same spec, plan, research, data model, contracts, and quickstart. The prompt explicitly says independent reviewers catch different problems, then merges the findings into one report.
That is a mature SDD pattern:
Do not trust one confident pass. Use independent review to detect drift, stale wording, missing coverage, and contradictions before implementation starts.
The checkpoint-workflow prompt then closes the loop after implementation. It takes an adversarial reviewer stance: assume the implementation has gaps until proven otherwise. It compares actual code and test results against the planned artifacts and classifies divergences.
This is the loop becoming inspectable:
- specify intent
- clarify decisions
- plan implementation
- run multi-angle spec review
- resolve high-impact ambiguities one question at a time
- verify artifacts
- execute workflow
- checkpoint divergence
- update the spec or fix the implementation
A platform for shared SDD workflows
This is where Specledger’s platform angle becomes important.
A local .agents/commands directory can encode a good workflow for one repository. But real SDD is social. Requirements come from users, product, design, engineering, security, QA, operations, and previous implementation history. If the workflow only lives in one agent’s context window, it is fragile.
A shared platform can give teams several things that plain prompt files cannot fully provide:
- a common place to review requirements and decisions
- durable checkpoints that survive chat sessions
- traceability from spec changes to implementation changes
- visibility into which decisions were human-made and which were agent-proposed
- session indexing so prior work becomes searchable context
- multi-repo coordination for features that cross service boundaries
- shared workflow conventions across teams and tools
That is why the phrase “ledger” is useful. A ledger is not just storage. It records changes in a way that can be inspected later.
For AI-assisted development, that is the difference between “the agent did something” and “the team can explain why the system changed.”
The human role moves to decision quality
This also changes the human role.
In a naive agent workflow, the human asks for code, waits, and reviews the result. That is a weak loop because the most important decisions may already be buried inside generated implementation.
In an evolved SDD workflow, humans steer earlier:
- approve or correct requirements
- resolve clarifying questions
- choose which reviewers and artifacts are in scope
- judge findings from Product, QA, Security, Architecture, Delivery, Constitution, Roadmap, SRE, Cost, UX, and Data angles
- review tradeoffs in the plan
- decide when ambiguity is acceptable
- choose which risks need spikes
- inspect verification findings before implementation
- checkpoint divergence after implementation
The agent still executes, but execution is surrounded by decision points.
This matches Specledger’s stated principle: humans steer, AI executes. The value is not that humans micromanage every line of code. The value is that humans keep authority over intent, tradeoffs, and acceptance.
SDD as organizational memory
The next step is memory.
A single spec helps one feature. A ledger of specs, decisions, checkpoints, sessions, and deltas helps the organization learn.
That matters because many engineering failures are not novel. Teams rediscover the same constraints, repeat the same architectural arguments, forget why a tradeoff was chosen, or lose context when a chat session ends.
Specledger’s session indexing and context-compounding language points at this deeper value. If every feature leaves behind a structured trail, future agents and future humans can start from a better place:
- previous decisions are easier to find
- old assumptions can be challenged explicitly
- recurring review failures can become checklist items
- stable implementation patterns can become skills
- cross-repo dependencies can be made visible instead of tribal
The spec becomes more than a pre-code artifact. It becomes part of the team’s long-term memory.
The tension: speed versus durability
The skillrig/cli workflow prompts also expose a healthy tension.
The experimental implementation workflow says it skips the durable sl issue ledger because the quickstart is intentionally smaller. That is a real tradeoff. Sometimes a team wants the full traceable workflow. Sometimes it wants a faster, bounded, deterministic workflow that still reads the design artifacts and gates on checks.
This is probably where SDD will keep evolving.
Not every feature needs the same amount of ceremony. A tiny bug fix does not need the same ledger as a multi-repo payment integration. But every workflow still needs a way to preserve the right amount of intent, verification, and review.
The mature version of SDD is not maximum documentation. It is calibrated traceability.
Use more ledger when the risk, ambiguity, or coordination cost is high. Use lighter workflows when the target is already clear. But do not remove the feedback loop.
Why this matters for AI engineering
AI makes implementation faster, but it does not make intent obvious.
That creates a new bottleneck:
The scarce resource is not generated code. The scarce resource is shared, inspectable, correct intent.
Spec-driven development began as a way to make intent explicit. Platforms like Specledger suggest the next stage: make intent collaborative, traceable, reviewable, executable, and memorable.
The practical shape is emerging:
- specs define behavior
- plans connect behavior to architecture
- tasks or workflows divide execution
- verification checks alignment before code
- checkpointing checks divergence after code
- dashboards and deltas keep humans in the steering loop
- session indexes and skills let context compound
That is how SDD evolves from a writing habit into an engineering system.
Conclusion
The future of spec-driven development is not simply better prompts or longer requirements documents.
It is shared workflow infrastructure.
Specledger is interesting because it treats SDD as a collaborative ledger of intent: a place where humans, AI agents, specs, plans, issues, checkpoints, reviews, and sessions can stay aligned.
That is the right direction for agentic software development. The more capable the agents become, the more important it is to know what they are supposed to optimize for, who approved the tradeoffs, how divergence is detected, and what the team learns from each loop.
The spec is no longer just where clarity lives.
It is where collaboration, control, and memory begin.
Sources
- Specledger — Product site describing Specledger as a spec-driven development platform for shared requirements, dashboards, deltas, checkpoints, session indexing, and AI-assisted execution.
- specledger/specledger — Public Specledger CLI repository inspected for the .agents/commands workflow prompts and CLI capabilities.
- skillrig/cli — Public CLI repository inspected for improved experimental Specledger workflow prompts, especially implement-workflow, verify-workflow, and checkpoint-workflow.
- Spec-Driven Development and Specifications — Earlier article establishing the core SDD argument: research discovers the target, specifications make it inspectable, development executes against it, and feedback keeps the system honest.
- ChatGPT shared conversation: Spec Driven Review Process — Shared conversation, reviewed from the downloaded Markdown export, exploring spec review as its own SDD workflow with Product Owner, QA, Security, Architecture, Delivery, Constitution, Roadmap, SRE, Cost, UX, and Data review angles.