Summary article

Evolving Spec-Driven Development

June 3, 2026

Spec-driven development is moving from static documents toward shared ledgers of intent: collaborative, traceable workflows where humans steer and AI agents execute against versioned specifications.

Short summary

The first version of spec-driven development is easy to describe: write down what good looks like before asking people or agents to build it.

That remains true, but it is no longer enough.

As AI-assisted engineering gets more capable, the specification stops being a passive document. It becomes shared infrastructure: a durable place where teams record intent, surface decisions, coordinate human review, route work to agents, detect drift, and learn from implementation.

That is the direction Specledger points toward. It frames SDD as a collaborative platform rather than a folder of documents: requirements, design, implementation, checkpoints, session history, dashboards, deltas, and human decision points all tied back to a shared source of truth.

The evolution is from spec as document to spec as ledger.

The old SDD baseline

In the earlier article, I described spec-driven development as the bridge between research and development.

Research discovers what good looks like. Specifications make that definition inspectable. Development executes against it. Feedback keeps the system honest.

That model is still the foundation. A useful specification turns vague intent into something people can review:

What problem are we solving?
What behavior matters?
What is explicitly out of scope?
What constraints are non-negotiable?
What examples prove the work is correct?
What risks or assumptions still need validation?

This already makes software development better because it moves feedback earlier. It is cheaper to fix a sentence than a production incident.

But AI agents change the pressure on the process.

If a human misreads vague intent, the team may lose a day. If a powerful agent misreads vague intent, it can produce a large, plausible, internally consistent wrong implementation very quickly. Cheap code generation makes the quality of the target more important, not less.

So SDD needs to evolve beyond “write a spec, then implement.” It needs an operating model for collaboration.

The downloaded “Spec Driven Review Process” conversation makes this more concrete. The starting point was a Product Owner describing how much real review work happens before code review: reading a spec for implicit intent, inconsistency, terminology drift, muddled concepts, scope that can be reduced, phase boundaries, roadmap consequences, and decisions that will constrain future work.

That is a useful correction to how people often talk about AI coding. The review surface is not only the pull request. In SDD, the specification itself needs review because it is the thing the implementation will optimize against.

The best short version from that conversation was:

Does this document create a coherent, executable path from intent to delivery while minimizing ambiguity, risk, waste, and future rework?

That question belongs before implementation. It turns spec review into an explicit quality function rather than an informal human habit.

Specledger’s useful reframing

Specledger’s product language is direct: it wants to be the single source of truth for spec-driven development. The important part is not just that it manages specs. The important part is that it treats SDD as a coordination problem.

The platform emphasizes:

human dashboards for tracking, reviewing, and steering AI collaboration
spec deltas and checkpoints so changes leave a trail
session indexing so AI work becomes reusable organizational context
multi-repo support for features that do not fit neatly inside one repository
CLI bootstrap and agent compatibility so the workflow can live inside normal engineering tools

That is a stronger framing than “documentation for AI.” Documentation is something you read. A ledger is something teams use to coordinate, audit, and reconcile reality.

In a serious AI-assisted workflow, the core question is not “can the agent write code?” The question is “can the team keep intent, implementation, review, and future memory aligned while the agent writes code?”

Specledger is interesting because it attacks that alignment problem directly.

The command prompts show the workflow shape

The most concrete evidence is in the repositories’ .agents/commands prompts.

The standard Specledger command set looks like a complete SDD lifecycle:

/specledger.constitution establishes project principles.
/specledger.specify turns a natural language feature description into a structured spec.
/specledger.clarify asks targeted questions and writes the answers back into the spec.
/specledger.plan turns the spec into architecture, stack choices, phases, and design artifacts.
/specledger.tasks creates dependency-ordered implementation work, backed by the sl issue tracker.
/specledger.verify checks consistency across spec.md, plan.md, and tasks.md before implementation.
/specledger.implement executes the task plan.
/specledger.checkpoint performs a critical divergence review between implementation and plan.
/specledger.spike gives uncertainty a first-class research workflow.
/specledger.checklist creates focused review checklists.
/specledger.onboard walks a user through the whole process.

That sequence matters because it is not just a prompt library. It is an attempt to make the implicit engineering loop explicit.

The commands repeatedly encode the same pattern:

discover the current feature context with sl spec info
read the generated artifact before editing it
treat the constitution as authoritative
preserve handoffs between phases
make missing context visible instead of hallucinating it
map requirements to design, tasks, and tests
verify consistency before implementation
checkpoint divergence after implementation

This is SDD becoming operational.

Spec review becomes a first-class workflow

The ChatGPT conversation also sketched a review pipeline that fits naturally into Specledger’s world.

The important move is to stop treating “review” as one generic pass. A useful spec review has angles, and each angle looks for different failure modes:

Product Owner: implicit intent, terminology drift, muddled concepts, downscope opportunities, phase boundaries, stakeholder alignment, and roadmap impact
QA: testability, acceptance criteria, edge cases, negative paths, undefined success and failure states, and regression risk
Security: trust boundaries, authentication, authorization, secrets, data exposure, auditability, supply chain, abuse cases, and multi-tenancy
Architecture: system boundaries, module boundaries, API contracts, data flow, coupling, extensibility, reversibility, migration paths, and technical debt
Delivery: sequencing, hidden dependencies, team boundaries, critical path, milestones, and risk concentration
Constitution: whether the proposed work passes, weakly aligns with, or violates the project’s operating principles
Roadmap: decisions that constrain future roadmap items, scope that should move in or out of the workstream, and sequencing across future work
Operations, Cost, UX, and Data: production ownership, lifecycle cost, human workflow clarity, and data lifecycle concerns

That list is valuable because it explains why a single reviewer often misses things. A QA reviewer is looking for objective testability. A roadmap reviewer is looking for future constraint. A constitution reviewer is looking for principle violations. A Product Owner is looking for intent clarity and scope control.

The review pipeline from the conversation also had a practical entrypoint: inspect which artifacts actually exist, then ask the user which artifacts and reviewers are in scope. Maybe the filesystem only has spec.md. Maybe it also has plan.md, quickstart.md, a constitution, and a roadmap. Review should still be possible with the available artifacts, but missing artifacts should be treated as missing context rather than automatic failure.

That is exactly the kind of workflow a ledger can preserve. Each finding can be anchored to an artifact location, classified by severity, tied to evidence, assigned a suggested resolution, and turned into a question with a recommended answer when human judgment is required.

The most interesting pattern was borrowed from Matt Pocock’s grill-me skill: ask one question at a time, provide a recommended answer, and inspect the codebase or artifacts instead of asking the user when the answer is already discoverable. For SDD, that becomes a disciplined way to resolve ambiguity without turning review into an endless meeting.

From linear commands to collaborative workflows

The improved prompts in skillrig/cli push the idea further.

The experimental specledger.implement-workflow command intentionally skips the durable issue ledger for a faster path, then launches a deterministic multi-agent implementation workflow. The pipeline is not random fan-out. It is dependency ordered:

scaffold the public API first
implement primitives in parallel where files are disjoint
implement operations once primitives exist
wire the CLI
add tests
verify and repair until checks pass
synchronize documentation

The prompt is opinionated about how to use agents safely. Every subagent prompt must begin with a SKILLS: line, because the design artifacts say what to build while skills carry how the repository builds things. It also insists on final verification through make check.

That is a useful evolution. A spec alone can tell an agent the goal. A workflow tells the agent system how to divide labor without losing the goal.

The paired specledger.verify-workflow command is even more revealing. It verifies artifacts without tasks.md by sending multiple independent reviewers through the same spec, plan, research, data model, contracts, and quickstart. The prompt explicitly says independent reviewers catch different problems, then merges the findings into one report.

That is a mature SDD pattern:

Do not trust one confident pass. Use independent review to detect drift, stale wording, missing coverage, and contradictions before implementation starts.

The checkpoint-workflow prompt then closes the loop after implementation. It takes an adversarial reviewer stance: assume the implementation has gaps until proven otherwise. It compares actual code and test results against the planned artifacts and classifies divergences.

This is the loop becoming inspectable:

specify intent
clarify decisions
plan implementation
run multi-angle spec review
resolve high-impact ambiguities one question at a time
verify artifacts
execute workflow
checkpoint divergence
update the spec or fix the implementation

A platform for shared SDD workflows

This is where Specledger’s platform angle becomes important.

A local .agents/commands directory can encode a good workflow for one repository. But real SDD is social. Requirements come from users, product, design, engineering, security, QA, operations, and previous implementation history. If the workflow only lives in one agent’s context window, it is fragile.

A shared platform can give teams several things that plain prompt files cannot fully provide:

a common place to review requirements and decisions
durable checkpoints that survive chat sessions
traceability from spec changes to implementation changes
visibility into which decisions were human-made and which were agent-proposed
session indexing so prior work becomes searchable context
multi-repo coordination for features that cross service boundaries
shared workflow conventions across teams and tools

That is why the phrase “ledger” is useful. A ledger is not just storage. It records changes in a way that can be inspected later.

For AI-assisted development, that is the difference between “the agent did something” and “the team can explain why the system changed.”

The human role moves to decision quality

This also changes the human role.

In a naive agent workflow, the human asks for code, waits, and reviews the result. That is a weak loop because the most important decisions may already be buried inside generated implementation.

In an evolved SDD workflow, humans steer earlier:

approve or correct requirements
resolve clarifying questions
choose which reviewers and artifacts are in scope
judge findings from Product, QA, Security, Architecture, Delivery, Constitution, Roadmap, SRE, Cost, UX, and Data angles
review tradeoffs in the plan
decide when ambiguity is acceptable
choose which risks need spikes
inspect verification findings before implementation
checkpoint divergence after implementation

The agent still executes, but execution is surrounded by decision points.

This matches Specledger’s stated principle: humans steer, AI executes. The value is not that humans micromanage every line of code. The value is that humans keep authority over intent, tradeoffs, and acceptance.

SDD as organizational memory

The next step is memory.

A single spec helps one feature. A ledger of specs, decisions, checkpoints, sessions, and deltas helps the organization learn.

That matters because many engineering failures are not novel. Teams rediscover the same constraints, repeat the same architectural arguments, forget why a tradeoff was chosen, or lose context when a chat session ends.

Specledger’s session indexing and context-compounding language points at this deeper value. If every feature leaves behind a structured trail, future agents and future humans can start from a better place:

previous decisions are easier to find
old assumptions can be challenged explicitly
recurring review failures can become checklist items
stable implementation patterns can become skills
cross-repo dependencies can be made visible instead of tribal

The spec becomes more than a pre-code artifact. It becomes part of the team’s long-term memory.

The tension: speed versus durability

The skillrig/cli workflow prompts also expose a healthy tension.

The experimental implementation workflow says it skips the durable sl issue ledger because the quickstart is intentionally smaller. That is a real tradeoff. Sometimes a team wants the full traceable workflow. Sometimes it wants a faster, bounded, deterministic workflow that still reads the design artifacts and gates on checks.

This is probably where SDD will keep evolving.

Not every feature needs the same amount of ceremony. A tiny bug fix does not need the same ledger as a multi-repo payment integration. But every workflow still needs a way to preserve the right amount of intent, verification, and review.

The mature version of SDD is not maximum documentation. It is calibrated traceability.

Use more ledger when the risk, ambiguity, or coordination cost is high. Use lighter workflows when the target is already clear. But do not remove the feedback loop.

Why this matters for AI engineering

AI makes implementation faster, but it does not make intent obvious.

That creates a new bottleneck:

The scarce resource is not generated code. The scarce resource is shared, inspectable, correct intent.

Spec-driven development began as a way to make intent explicit. Platforms like Specledger suggest the next stage: make intent collaborative, traceable, reviewable, executable, and memorable.

The practical shape is emerging:

specs define behavior
plans connect behavior to architecture
tasks or workflows divide execution
verification checks alignment before code
checkpointing checks divergence after code
dashboards and deltas keep humans in the steering loop
session indexes and skills let context compound

That is how SDD evolves from a writing habit into an engineering system.

Conclusion

The future of spec-driven development is not simply better prompts or longer requirements documents.

It is shared workflow infrastructure.

Specledger is interesting because it treats SDD as a collaborative ledger of intent: a place where humans, AI agents, specs, plans, issues, checkpoints, reviews, and sessions can stay aligned.

That is the right direction for agentic software development. The more capable the agents become, the more important it is to know what they are supposed to optimize for, who approved the tradeoffs, how divergence is detected, and what the team learns from each loop.

The spec is no longer just where clarity lives.

It is where collaboration, control, and memory begin.

Sources

Specledger — Product site describing Specledger as a spec-driven development platform for shared requirements, dashboards, deltas, checkpoints, session indexing, and AI-assisted execution.
specledger/specledger — Public Specledger CLI repository inspected for the .agents/commands workflow prompts and CLI capabilities.
skillrig/cli — Public CLI repository inspected for improved experimental Specledger workflow prompts, especially implement-workflow, verify-workflow, and checkpoint-workflow.
Spec-Driven Development and Specifications — Earlier article establishing the core SDD argument: research discovers the target, specifications make it inspectable, development executes against it, and feedback keeps the system honest.
ChatGPT shared conversation: Spec Driven Review Process — Shared conversation, reviewed from the downloaded Markdown export, exploring spec review as its own SDD workflow with Product Owner, QA, Security, Architecture, Delivery, Constitution, Roadmap, SRE, Cost, UX, and Data review angles.