Open Source Development Framework

Project OS

A spec-driven, agentic development system for Claude Code. Turn vague ideas into working software with persistent memory, specialist agents, and automatic quality gates.

$ /start-project Build a real-time travel companion app

Five commands. Idea to production.

Each command orchestrates multiple specialist agents, reads persistent memory, and enforces quality gates. You provide intent. The system handles everything else.

1 /start-project
2 /plan-feature
3 /build-task
4 /review-task
5 /ship-check

How the system thinks

An orchestrator reads project memory, classifies your request, routes to the right specialist, and enforces gates between phases.

You: "Add email alerts when members go at-risk" | v [Orchestrator] reads agent-memory/STATE.md, TASKS.md, SPEC.md | |-- Requirement unclear? --> [Spec Agent] --> updates SPEC.md |-- Architecture needed? --> [Architect Agent] --> updates ARCHITECTURE.md | v [Builder Agent] implements against spec + architecture | v [Reviewer Agent] critiques: correctness, security, edge cases | must-fix items? --> back to builder v [QA Agent] verifies acceptance criteria from SPEC.md | any FAIL? --> back to builder v DONE --> TASKS.md, EVAL.md, STATE.md updated

Seven specialists, one workflow

Each agent has a single responsibility, clear inputs/outputs, and strict rules about what it can and cannot do.

Orchestrator

Routes work to the right agent. Enforces gates between phases. Maintains project state. Never lets builder redefine requirements.

opus

Spec Agent

Translates vague ideas into structured specs with acceptance criteria. Makes reasonable assumptions instead of asking 10 questions.

sonnet

Architect Agent

Designs the simplest viable architecture. Every component traces to a spec requirement. No overengineering.

opus

Builder Agent

Implements one task at a time against approved spec and architecture. No silent scope creep. Tests required for core behavior.

sonnet

Reviewer Agent

Skeptical code review. Classifies findings as must-fix, should-fix, or optional. Checks security, edge cases, spec alignment.

opus

QA Agent

Verifies work against documented acceptance criteria. "Code running" is not enough — observable behavior must match the spec.

sonnet

Ops Agent

Prepares release readiness. Deploy checklists, env var docs, rollback plans, monitoring notes. Flags missing items as blockers.

sonnet

The repo is the source of truth

Seven markdown files in agent-memory/ persist project state across sessions. No more re-explaining context. Every session starts by reading these files.

PROJECT.md

Identity: name, goal, users, success metrics, constraints, non-goals

SPEC.md

Requirements: problem, user flows, functional/non-functional reqs, acceptance criteria, risks

ARCHITECTURE.md

Design: system diagram, components, data flow, interfaces, tradeoffs, phases

DECISIONS.md

Decision log: context, decision, alternatives, rationale, consequences

TASKS.md

Work tracking: backlog, in progress, blocked, done

EVAL.md

Quality: acceptance criteria pass/fail, test coverage, regression risks

STATE.md

Current state: objective, last step, blockers, assumptions, next actions

Slash commands

Each command is a complete workflow that reads memory, routes to agents, and updates state.

/start-project
Initialize a new project from a product idea. Bootstraps the full OS if missing, then populates all 7 memory files with inferred specs, architecture, tasks, and acceptance criteria.
bootstrap spec-agent architect-agent write memory
/plan-feature
Plan a feature end-to-end. Routes to spec-agent if requirements are incomplete, architect-agent if design decisions are needed, then creates ordered tasks with acceptance criteria.
read memory spec check arch check task breakdown
/build-task
Build the next task with automatic quality gates. Implements code, self-reviews for issues, verifies acceptance criteria via QA. No completion without review + QA pass.
builder reviewer gate QA gate update memory
/review-task
Standalone code review. Evaluates correctness, security, edge cases, complexity, test coverage, and spec alignment. Classifies findings as must-fix, should-fix, optional.
reviewer-agent structured output
/ship-check
Release readiness verification. Checks task completion, acceptance criteria, test results, deploy checklist, rollback plan, monitoring. Returns SHIP / DO NOT SHIP recommendation.
ops-agent all memory files release report

Non-negotiable gates

The system enforces these automatically. No human discipline required.

🛑

No coding without a spec

Builder-agent refuses to start if acceptance criteria don't exist. Routes to spec-agent first.

🛑

No completion without review

Every /build-task runs reviewer-agent. Must-fix items block completion and go back to builder.

🛑

No completion without QA

QA-agent verifies each acceptance criterion. Any FAIL sends work back. "It runs" is not enough.

🛑

No silent scope creep

Builder only implements what's in the current task. Extras go to TASKS.md backlog, not into the code.

Assumptions documented, not asked

Agents make reasonable assumptions and log them in DECISIONS.md. Only escalate when wrong guess = material risk.

Before vs. after

Without Project OS

  • Re-explain codebase every session
  • Manually specify what to build
  • Hope nothing was missed
  • Forget previous decisions
  • No quality enforcement
  • Context lost between sessions
  • Prompt engineering every time

With Project OS

  • Memory files load automatically
  • State your intent, agents spec it
  • Acceptance criteria verified by QA
  • DECISIONS.md persists rationale
  • Gates enforced on every task
  • STATE.md picks up where you left off
  • Intent engineering replaces prompting

Design philosophy

1

Few capable agents

Seven specialist roles, not twenty personas. Each has one job and clear boundaries.

2

Repo as source of truth

Memory files, not conversation history. Every session reads the same persistent state.

3

Spec before code

Requirements and acceptance criteria must exist before implementation starts.

4

Review before done

No task completes without review and QA. "It works" is insufficient.

5

Assumptions over questions

Document reasonable assumptions. Only ask when wrong guess = material risk.

6

Reusable across projects

One command bootstraps the entire system into any repo. Agents are project-agnostic.

Quick start

Four commands from empty directory to structured project with specs, architecture, and a task backlog.

# Create your project
mkdir ~/my-app && cd ~/my-app && git init

# In Claude Code:
/start-project A CLI that watches for screenshots and uploads to R2 with a share link

# Plan the first feature
/plan-feature File watcher + R2 upload with clipboard copy

# Build it (auto review + QA)
/build-task

# Ready to ship?
/ship-check

What gets created

.claude/ agents/ orchestrator.md # Routes work, enforces gates spec-agent.md # Vague ideas -> specs architect-agent.md # Specs -> simplest design builder-agent.md # Design -> code reviewer-agent.md # Skeptical code review qa-agent.md # Acceptance criteria verification ops-agent.md # Release readiness commands/ start-project.md # Initialize from idea plan-feature.md # Spec -> arch -> tasks build-task.md # Build -> review -> QA review-task.md # Standalone review ship-check.md # Release gate hooks/ pre-task.sh # Load context before work post-task.sh # Remind to update memory pre-complete.sh # Check quality gates settings.json # Stop hook config agent-memory/ PROJECT.md # Who, what, why SPEC.md # Requirements + criteria ARCHITECTURE.md # System design DECISIONS.md # Decision log TASKS.md # Work tracker EVAL.md # QA results STATE.md # Current status