Outcome-driven development.
Define what done looks like — then prove it.
Agents burn tokens describing what they did — “I created the file, added the function, registered the route.” No evidence it works.
Tests pass while the feature is broken. A green test suite proves the test suite is green — it doesn't prove the system works.
Security, privacy, and performance concerns surface in production, not at planning time. Fixing costs hours instead of minutes.
Activity completion is a proxy for outcomes — and proxies drift. Tests pass, code review approves, activity narration burns tokens — while the system still doesn't work.
Before any task executes, define two things: what done looks like to an observer, and the specific action that proves it.
| Dimension | Activity-Based | Flywheel |
|---|---|---|
| Done means | “Did you do the thing?” | “Can you prove it works?” |
| Verification | Tests pass | Evidence closes the loop |
| Agent output | Narration of steps taken | PROVEN + raw evidence |
| NFR concerns | Caught in production | Surfaced at plan time |
| Proof effort | Same for every task | Goldilocks — calibrated to complexity |
Are we manufacturing success, or managing decline? Activity without evidence is decline dressed up as progress.
Forward progression is deterministic. Backward routing is free — any failure escalates directly to the right level.
Design the thing. Define what overall success looks like. Write the design document with the overall Theory of Success.
Break into tasks. Each task gets a Theory of Success, a specific proof action, and a lightweight NFR scan (security, privacy, performance).
Convergence loops. Implementer builds + proves. Verifier evaluates evidence against Theory of Success. Loop until VERIFIED or escalate.
Acceptance gate — system-level proof. Then cleanup, commit with evidence summary, push or open a PR.
Facilitates design refinement. Writes the design document after conversational validation.
Creates task plans with Theory of Success + NFR scan per task. Implementation-level detail.
Builds the thing, runs the proof action, returns evidence. Evidence is the deliverable — not the code.
Evaluates evidence against Theory of Success. Uses the Goldilocks rubric. Not a code reviewer.
The implementer returns PROVEN + raw evidence, not a story about what it built. The verifier returns VERIFIED, not a lengthy analysis. Token cost proportional to results, not effort.
curl showing the response body — proves the endpoint worksgrep showing the expected line — proves the process randescribe showing a new column — proves the migration rangrep of the changed key, not a unit test of config parsingThe goal is confidence, not certainty. Perfect verification is the enemy of shipping. A config change doesn't need a curl. An API endpoint doesn't need a load test. Calibrate.
Every task gets a lightweight scan — 2–3 lines identifying which non-functional concerns apply and what “good enough” means.
“Validate JWT signature, not just decode. Check expiry. Reject tampered tokens.”
“No PII in token payload or application logs. Minimal data collection.”
“No DB call per request — validation is stateless. Cache token issuer config.”
“File writes use atomic rename. No lock held across network call.”
“Retry with backoff on transient errors. Fail closed on auth — don't allow through.”
Minutes to address. Production-time cost: hours to debug, days to fix properly. The scan prevents the obvious thing you'd miss.
Flywheel works with Amplifier bundles and Claude Code — same philosophy, native integration for each.
4 modes, 4 agents, 6 skills — composable onto any bundle. Includes mode-to-mode transitions and agent delegation.
Installs skills into .claude/skills/ for native Claude Code integration. Same 4-phase workflow.
No runtime dependencies. No package to install (for Amplifier). Pure markdown, YAML, and philosophy — composable onto any bundle.
Built as a complete replacement for the Superpowers methodology. Same daily-driver use case, different epistemology: evidence loops over activity checklists.
Includes amplifier-foundation and amplifier-bundle-modes. Bring your own tools — flywheel adds the methodology layer.
Repository: kenotron-ms/flywheel on GitHub
Primary contributor: kenotron-ms (Ken) — 12 commits, sole contributor
Created: April 18, 2026 · Last pushed: April 19, 2026
Version: 0.1.0 · License: MIT
Data sources (all data verified from real artifacts):
README.md — quick start, phase overview, core principlesbundle.md — bundle manifest, includes, core principles, workflow diagramcontext/philosophy.md — Theory of Success, outcome over activity, Goldilocks principle, anti-patternscontext/instructions.md — standing orders, mode sequence, token discipline, methodology calibrationcontext/using-flywheel.md — skill priority, agent delegation rules, red flagsagents/implementer.md, agents/verifier.md — agent contracts, status codes, iron lawsskills/nfr-scan/SKILL.md — 5 concern types, per-task NFR formatskills/verification-rubric/SKILL.md — Goldilocks rubric table, calibration guidancedocs/plans/2026-04-18-outcome-driven-methodology-design.md — original design documentbehaviors/flywheel-methodology.yaml — bundle wiring, hooks, tools, agent includesgit log — 12 commits, 1 contributor, repo created 2026-04-18flywheel bundle + flywheel-methodology-behaviorBundle stats: 39 files total — 32 Markdown, 1 YAML behavior, 2 shell test scripts, 1 JS installer, 1 JSON package manifest, 1 Dockerfile, 1 .gitignore. ~6,051 lines across all source files.
Methodology: Repository cloned and inspected directly. All statistics from git log, find, and wc. All quotes from source files. No data fabricated.