AI Workflow Skill: Spec to Production in 9 Actions

I've been shipping production-quality features faster than ever. Not because I write more code—but because I barely write any.

The difference isn't the AI model. It's the process around it.

The Problem

Most developers use AI coding agents like a chat window. Ask for code, paste it in, ask for fixes, lose context, repeat. It ships—sometimes. But the quality is inconsistent, the process is chaotic, and nothing carries over between sessions.

Without a workflow

No clear definition of done
No systematic code review
No quality gates unless you remember
No way to resume after context loss
Learnings disappear every session

With the workflow skill

Spec defines done before code exists
9-perspective code review, automatic
Lint, typecheck, build, test on every change
Spec file enables perfect session resume
Retros and memory updates after every feature

I wanted a system where I define what to build, and the agent handles the entire implementation loop until the code is production-ready.

So I built one.

The Workflow Skill

A structured development lifecycle for AI coding agents. 9 actions, from idea to shipped code:

Action	What It Does
`focus`	Scan the codebase for what to work on next, prioritized by impact
`plan`	Create a spec with acceptance criteria and codebase impact analysis
`spec-review`	Adversarial challenge of your spec before implementation starts
`spike`	Time-boxed exploration for unknowns, go/no-go decision at the end
`ship`	Implement, test, review, fix—loop until all ACs pass and gates are green
`review`	Multi-perspective code review (9 perspectives, risk-scaled)
`done`	Final validation, retro, memory update, archive
`drop`	Abandon gracefully, preserve learnings for next time
`workflow`	Show current state and suggest next action

The core insight: the spec is the source of truth. Everything flows from it—implementation, testing, review, validation.

The Full Dev Workflow

1. Focus: What Should I Work On?

Don't know what to build next? Ask the codebase:

~/my-project

~ /workflow what should I focus on?

The agent dispatches parallel scans across your entire project—checking code quality, missing tests, security gaps, performance issues, accessibility problems. Results come back scored by impact, effort, and risk-if-ignored.

It produces a prioritized task list and creates specs in specs/backlog/ for the top items.

Instead of deciding what to work on, you let the codebase tell you.

2. Plan: Define What "Done" Means

You don't start with code. You start with a spec.

~/my-project

~ /workflow plan add rate limiting to the API

The agent reads your codebase first—existing patterns, related code, potential conflicts. Then it writes a spec with:

User journey: Who does what, and why (ACTOR/GOAL/PRECONDITION/POSTCONDITION)
Acceptance criteria (ACs): GIVEN/WHEN/THEN format—testable, objective
Scope items: Exactly what gets built, traceable to ACs
Codebase impact: Files affected, dependencies, breaking changes

The spec passes through 13 validation rules before implementation starts. Too big (>8 hours)? It gets split. Vague acceptance criteria? The agent pushes back.

No spec passes the gate if "done" isn't clearly defined.

3. Spec-Review: Challenge Before You Build

Before writing a single line of code, challenge the plan:

~/my-project

~ /workflow review @specs/active/rate-limiting.md

The agent runs adversarial analysis on your spec:

Missing edge cases
Scope that doesn't trace back to acceptance criteria
Assumptions that could be wrong
Blind spots in the codebase impact analysis

Catch problems in the spec—not in production.

4. Spike: Explore Before You Commit

Not sure about the approach? Run a time-boxed exploration:

~/my-project

~ /workflow spike Redis vs in-memory for rate limiting

Hard time limit (default 1h, max 4h). The output is a GO/NO-GO decision—not code. Spike code is throwaway, deleted before proceeding. The learning gets logged.

This prevents committing to an approach before you know it works.

5. Ship: The Implementation Loop

Once the spec is solid:

~/my-project

~ /workflow ship @specs/active/rate-limiting.md

The agent enters a loop:

You're not pair programming. You're delegating. The agent handles the loop. You review the output.

Two modes adapt to context:

One-shot: No spec needed for quick fixes—creates inline spec, implements, validates
Loop: Active spec exists—iterates until all acceptance criteria pass

Bug fixes get special treatment with Anti-Cascade TDD:

Baseline: Record test suite state
Red: Write failing regression test
Green: Implement fix
Diff: Compare full suite to baseline—catch regressions the fix introduced
Scan: Check for the same bug pattern elsewhere in the codebase

6. Quality Gates: Automatic Validation

Every edit batch triggers a quick pass:

Gate	Scope
Lint	Changed files
Typecheck	Changed files

Before marking done, a full pass runs:

Gate	Scope
Lint	Changed files
Typecheck	Full project
Build	Full project
Test	Related tests

The skill auto-detects your tooling. Biome or ESLint? Vitest or Jest? pnpm, yarn, or bun? It reads your config files and figures it out.

No special setup. Production-quality validation from day one.

7. Review: Nine Perspectives, Scaled by Risk

Code review runs automatically during the ship loop—and on demand:

~/my-project

~ /workflow review the code

Perspective	Question	When
Correctness	Does it do the right thing?	Always
Security	Is it safe?	Always
Reliability	Does it handle failure?	Always
Performance	Is it fast enough?	Always
DX	Is it pleasant to maintain?	Always
Scalability	Shared state, multi-instance?	Conditional
Observability	Can you debug in production?	Conditional
Testability	Complex logic covered?	Conditional
Accessibility	Keyboard, screen reader, contrast?	Conditional

Review depth scales with risk:

Scope	Depth
1-2 files, low risk	Quick (5 perspectives)
3-5 files	Standard
6+ files or high risk	Deep (all 9)
Deploy context detected	Production mode

8. Done: Validate, Learn, Archive

When everything passes:

~/my-project

~ /workflow done @specs/active/rate-limiting.md

Final validation:

All acceptance criteria pass
Quality gates green
New behavior has new tests
No blocking review issues

Then a retro runs automatically:

Estimate vs actual time
What worked, what didn't
Patterns learned

The agent proposes memory updates—coding patterns, project conventions, anti-patterns discovered. These get written to your agent config so the next session starts smarter.

Spec archives to specs/shipped/. History logged. Knowledge retained.

9. Drop: Abandon Without Losing

Sometimes a feature doesn't work out:

~/my-project

~ /workflow drop @specs/active/rate-limiting.md

Captures why it was abandoned. Preserves reusable pieces. Documents "if revisited" lessons. Archives to specs/dropped/.

No silent abandonment. Every dropped feature teaches the next one.

Why This Builds Better Software

Spec-Driven Quality

The spec defines what production-ready means before code exists. Acceptance criteria are testable. Scope items trace to ACs. The agent validates against the spec—not against vibes.

This means quality is structural, not accidental.

Same-Day Shipping

Everything is scoped to what ships today. Features over 8 hours get split. The tiering system enforces it:

Size	Ceremony
< 5 LOC	None—just do it
< 30 LOC	Inline comment spec
< 100 LOC	Mini template
100+ LOC	Full spec with state machines

No two-week sprints. No ceremony overhead. Ideas ship the same day they're conceived.

Human Controls Deployment

The skill never runs git push or deploy commands. The agent handles code quality. You handle production.

This separation matters for trust. I delegate the build loop because I know the agent won't touch anything irreversible without asking.

Session Resilience

Context gets lost. It happens. The spec file enables perfect resume:

~/my-project

~ /workflow ship @specs/active/rate-limiting.md

The agent reads the spec, checks current state, picks up exactly where it left off. No "remind me what we were building" moments.

Agent-Agnostic

Not locked to any single tool. The same skill works with Claude Code, Codex, OpenCode, Cursor, Windsurf, Aider—any agent that reads SKILL.md files.

The AI tooling landscape shifts fast. The workflow stays portable.

Get Started

The workflow skill is available at skills.sh/bntvllnt/agent-skills/workflow:

~/my-project

~ npx skills add bntvllnt/agent-skills --skill workflow

Or copy directly from the repo:

~/my-project

~ git clone https://github.com/bntvllnt/agent-skills

~ cp -r agent-skills/workflow .claude/skills/

The Flow

No config required. The skill auto-detects your project's tooling.

Trade-offs

This isn't magic. Real trade-offs:

Overhead for small changes: A one-line typo fix doesn't need a spec. The skill detects trivial changes and skips ceremony—but sometimes you just want to edit and commit.

Learning curve: The spec format and actions take time to internalize. First week feels slower. After that, faster than before.

Agent quality varies: The loop is only as good as the agent's implementation. Complex algorithms and domain-specific code still need careful human review.

Token usage: Multi-perspective review and iterative fixing consume tokens. Worth it for production code. Overkill for throwaway scripts.

Why I Built This

Momentum matters more than perfection.

I used to lose half my energy to process—where was I? What was I building? Did I test that edge case? Now the spec holds all state. Quality gates run automatically. The agent reviews its own code from 9 perspectives before I even look at it.

The result: better software, shipped faster, from day one.

Ship → observe → adjust. Every day.

If that resonates, the skill is at skills.sh/bntvllnt/agent-skills/workflow. The source is on GitHub.

Glossary

Acceptance Criteria (ACs) — Testable conditions that define when a feature is "done." Written in GIVEN/WHEN/THEN format. Example: GIVEN a user sends 100 requests in 1 minute, WHEN they send request 101, THEN they receive a 429 status with a Retry-After header. Every scope item traces back to at least one AC. If you can't write an AC for it, it's not in scope.

Scope Items — The specific implementation tasks that fulfill ACs. Each scope item maps to one or more ACs, creating bidirectional traceability.

Quality Gates — Automated validation checks (lint, typecheck, build, test) that must pass before code is considered done. Quick pass runs after each edit; full pass runs before completion.

SKILL.md — The standard file format for defining agent skills. Any AI coding agent that reads SKILL.md files can load and execute the workflow skill.

More posts on building software in the Building category.

Spec to Production: My AI Workflow Skill

The Problem

The Workflow Skill

The Full Dev Workflow

1. Focus: What Should I Work On?

2. Plan: Define What "Done" Means

3. Spec-Review: Challenge Before You Build

4. Spike: Explore Before You Commit

5. Ship: The Implementation Loop

6. Quality Gates: Automatic Validation

7. Review: Nine Perspectives, Scaled by Risk

8. Done: Validate, Learn, Archive

9. Drop: Abandon Without Losing

Why This Builds Better Software

Spec-Driven Quality

Same-Day Shipping

Human Controls Deployment

Session Resilience

Agent-Agnostic

Get Started

The Flow

Trade-offs

Why I Built This

Glossary

Share this article

BNTVLLNT

Related articles

Software Is Dying. AI Agents Are Killing It.

No Laptop, No Problem: Tablet + Remote Servers as a Dev Rig

Categories