Summary article

Evolving Spec-Driven Development

June 3, 2026

Spec-driven development is moving from static documents toward shared ledgers of intent: collaborative, traceable workflows where humans steer and AI agents execute against versioned specifications.

Short summary

The first version of spec-driven development is easy to describe: write down what good looks like before asking people or agents to build it.

That remains true, but it is no longer enough.

As AI-assisted engineering gets more capable, the specification stops being a passive document. It becomes shared infrastructure: a durable place where teams record intent, surface decisions, coordinate human review, route work to agents, detect drift, and learn from implementation.

That is the direction Specledger points toward. It frames SDD as a collaborative platform rather than a folder of documents: requirements, design, implementation, checkpoints, session history, dashboards, deltas, and human decision points all tied back to a shared source of truth.

The evolution is from spec as document to spec as ledger.

The old SDD baseline

In the earlier article, I described spec-driven development as the bridge between research and development.

Research discovers what good looks like. Specifications make that definition inspectable. Development executes against it. Feedback keeps the system honest.

That model is still the foundation. A useful specification turns vague intent into something people can review:

This already makes software development better because it moves feedback earlier. It is cheaper to fix a sentence than a production incident.

But AI agents change the pressure on the process.

If a human misreads vague intent, the team may lose a day. If a powerful agent misreads vague intent, it can produce a large, plausible, internally consistent wrong implementation very quickly. Cheap code generation makes the quality of the target more important, not less.

So SDD needs to evolve beyond “write a spec, then implement.” It needs an operating model for collaboration.

The downloaded “Spec Driven Review Process” conversation makes this more concrete. The starting point was a Product Owner describing how much real review work happens before code review: reading a spec for implicit intent, inconsistency, terminology drift, muddled concepts, scope that can be reduced, phase boundaries, roadmap consequences, and decisions that will constrain future work.

That is a useful correction to how people often talk about AI coding. The review surface is not only the pull request. In SDD, the specification itself needs review because it is the thing the implementation will optimize against.

The best short version from that conversation was:

Does this document create a coherent, executable path from intent to delivery while minimizing ambiguity, risk, waste, and future rework?

That question belongs before implementation. It turns spec review into an explicit quality function rather than an informal human habit.

Specledger’s useful reframing

Specledger’s product language is direct: it wants to be the single source of truth for spec-driven development. The important part is not just that it manages specs. The important part is that it treats SDD as a coordination problem.

The platform emphasizes:

That is a stronger framing than “documentation for AI.” Documentation is something you read. A ledger is something teams use to coordinate, audit, and reconcile reality.

In a serious AI-assisted workflow, the core question is not “can the agent write code?” The question is “can the team keep intent, implementation, review, and future memory aligned while the agent writes code?”

Specledger is interesting because it attacks that alignment problem directly.

The command prompts show the workflow shape

The most concrete evidence is in the repositories’ .agents/commands prompts.

The standard Specledger command set looks like a complete SDD lifecycle:

  1. /specledger.constitution establishes project principles.
  2. /specledger.specify turns a natural language feature description into a structured spec.
  3. /specledger.clarify asks targeted questions and writes the answers back into the spec.
  4. /specledger.plan turns the spec into architecture, stack choices, phases, and design artifacts.
  5. /specledger.tasks creates dependency-ordered implementation work, backed by the sl issue tracker.
  6. /specledger.verify checks consistency across spec.md, plan.md, and tasks.md before implementation.
  7. /specledger.implement executes the task plan.
  8. /specledger.checkpoint performs a critical divergence review between implementation and plan.
  9. /specledger.spike gives uncertainty a first-class research workflow.
  10. /specledger.checklist creates focused review checklists.
  11. /specledger.onboard walks a user through the whole process.

That sequence matters because it is not just a prompt library. It is an attempt to make the implicit engineering loop explicit.

The commands repeatedly encode the same pattern:

This is SDD becoming operational.

Spec review becomes a first-class workflow

The ChatGPT conversation also sketched a review pipeline that fits naturally into Specledger’s world.

The important move is to stop treating “review” as one generic pass. A useful spec review has angles, and each angle looks for different failure modes:

That list is valuable because it explains why a single reviewer often misses things. A QA reviewer is looking for objective testability. A roadmap reviewer is looking for future constraint. A constitution reviewer is looking for principle violations. A Product Owner is looking for intent clarity and scope control.

The review pipeline from the conversation also had a practical entrypoint: inspect which artifacts actually exist, then ask the user which artifacts and reviewers are in scope. Maybe the filesystem only has spec.md. Maybe it also has plan.md, quickstart.md, a constitution, and a roadmap. Review should still be possible with the available artifacts, but missing artifacts should be treated as missing context rather than automatic failure.

That is exactly the kind of workflow a ledger can preserve. Each finding can be anchored to an artifact location, classified by severity, tied to evidence, assigned a suggested resolution, and turned into a question with a recommended answer when human judgment is required.

The most interesting pattern was borrowed from Matt Pocock’s grill-me skill: ask one question at a time, provide a recommended answer, and inspect the codebase or artifacts instead of asking the user when the answer is already discoverable. For SDD, that becomes a disciplined way to resolve ambiguity without turning review into an endless meeting.

From linear commands to collaborative workflows

The improved prompts in skillrig/cli push the idea further.

The experimental specledger.implement-workflow command intentionally skips the durable issue ledger for a faster path, then launches a deterministic multi-agent implementation workflow. The pipeline is not random fan-out. It is dependency ordered:

The prompt is opinionated about how to use agents safely. Every subagent prompt must begin with a SKILLS: line, because the design artifacts say what to build while skills carry how the repository builds things. It also insists on final verification through make check.

That is a useful evolution. A spec alone can tell an agent the goal. A workflow tells the agent system how to divide labor without losing the goal.

The paired specledger.verify-workflow command is even more revealing. It verifies artifacts without tasks.md by sending multiple independent reviewers through the same spec, plan, research, data model, contracts, and quickstart. The prompt explicitly says independent reviewers catch different problems, then merges the findings into one report.

That is a mature SDD pattern:

Do not trust one confident pass. Use independent review to detect drift, stale wording, missing coverage, and contradictions before implementation starts.

The checkpoint-workflow prompt then closes the loop after implementation. It takes an adversarial reviewer stance: assume the implementation has gaps until proven otherwise. It compares actual code and test results against the planned artifacts and classifies divergences.

This is the loop becoming inspectable:

  1. specify intent
  2. clarify decisions
  3. plan implementation
  4. run multi-angle spec review
  5. resolve high-impact ambiguities one question at a time
  6. verify artifacts
  7. execute workflow
  8. checkpoint divergence
  9. update the spec or fix the implementation

A platform for shared SDD workflows

This is where Specledger’s platform angle becomes important.

A local .agents/commands directory can encode a good workflow for one repository. But real SDD is social. Requirements come from users, product, design, engineering, security, QA, operations, and previous implementation history. If the workflow only lives in one agent’s context window, it is fragile.

A shared platform can give teams several things that plain prompt files cannot fully provide:

That is why the phrase “ledger” is useful. A ledger is not just storage. It records changes in a way that can be inspected later.

For AI-assisted development, that is the difference between “the agent did something” and “the team can explain why the system changed.”

The human role moves to decision quality

This also changes the human role.

In a naive agent workflow, the human asks for code, waits, and reviews the result. That is a weak loop because the most important decisions may already be buried inside generated implementation.

In an evolved SDD workflow, humans steer earlier:

The agent still executes, but execution is surrounded by decision points.

This matches Specledger’s stated principle: humans steer, AI executes. The value is not that humans micromanage every line of code. The value is that humans keep authority over intent, tradeoffs, and acceptance.

SDD as organizational memory

The next step is memory.

A single spec helps one feature. A ledger of specs, decisions, checkpoints, sessions, and deltas helps the organization learn.

That matters because many engineering failures are not novel. Teams rediscover the same constraints, repeat the same architectural arguments, forget why a tradeoff was chosen, or lose context when a chat session ends.

Specledger’s session indexing and context-compounding language points at this deeper value. If every feature leaves behind a structured trail, future agents and future humans can start from a better place:

The spec becomes more than a pre-code artifact. It becomes part of the team’s long-term memory.

The tension: speed versus durability

The skillrig/cli workflow prompts also expose a healthy tension.

The experimental implementation workflow says it skips the durable sl issue ledger because the quickstart is intentionally smaller. That is a real tradeoff. Sometimes a team wants the full traceable workflow. Sometimes it wants a faster, bounded, deterministic workflow that still reads the design artifacts and gates on checks.

This is probably where SDD will keep evolving.

Not every feature needs the same amount of ceremony. A tiny bug fix does not need the same ledger as a multi-repo payment integration. But every workflow still needs a way to preserve the right amount of intent, verification, and review.

The mature version of SDD is not maximum documentation. It is calibrated traceability.

Use more ledger when the risk, ambiguity, or coordination cost is high. Use lighter workflows when the target is already clear. But do not remove the feedback loop.

Why this matters for AI engineering

AI makes implementation faster, but it does not make intent obvious.

That creates a new bottleneck:

The scarce resource is not generated code. The scarce resource is shared, inspectable, correct intent.

Spec-driven development began as a way to make intent explicit. Platforms like Specledger suggest the next stage: make intent collaborative, traceable, reviewable, executable, and memorable.

The practical shape is emerging:

That is how SDD evolves from a writing habit into an engineering system.

Conclusion

The future of spec-driven development is not simply better prompts or longer requirements documents.

It is shared workflow infrastructure.

Specledger is interesting because it treats SDD as a collaborative ledger of intent: a place where humans, AI agents, specs, plans, issues, checkpoints, reviews, and sessions can stay aligned.

That is the right direction for agentic software development. The more capable the agents become, the more important it is to know what they are supposed to optimize for, who approved the tradeoffs, how divergence is detected, and what the team learns from each loop.

The spec is no longer just where clarity lives.

It is where collaboration, control, and memory begin.

Sources

  • Specledger — Product site describing Specledger as a spec-driven development platform for shared requirements, dashboards, deltas, checkpoints, session indexing, and AI-assisted execution.
  • specledger/specledger — Public Specledger CLI repository inspected for the .agents/commands workflow prompts and CLI capabilities.
  • skillrig/cli — Public CLI repository inspected for improved experimental Specledger workflow prompts, especially implement-workflow, verify-workflow, and checkpoint-workflow.
  • Spec-Driven Development and Specifications — Earlier article establishing the core SDD argument: research discovers the target, specifications make it inspectable, development executes against it, and feedback keeps the system honest.
  • ChatGPT shared conversation: Spec Driven Review Process — Shared conversation, reviewed from the downloaded Markdown export, exploring spec review as its own SDD workflow with Product Owner, QA, Security, Architecture, Delivery, Constitution, Roadmap, SRE, Cost, UX, and Data review angles.