MADE:Explorations · Team

Flywheel

Outcome-driven development.
Define what done looks like — then prove it.

Active · MIT License
May 2026 · kenotron-ms/flywheel
The Problem

Activity is not
progress

📝

Narrated Activity

Agents burn tokens describing what they did — “I created the file, added the function, registered the route.” No evidence it works.

Green Tests, Broken Systems

Tests pass while the feature is broken. A green test suite proves the test suite is green — it doesn't prove the system works.

🙈

NFR Blindness

Security, privacy, and performance concerns surface in production, not at planning time. Fixing costs hours instead of minutes.

Activity completion is a proxy for outcomes — and proxies drift. Tests pass, code review approves, activity narration burns tokens — while the system still doesn't work.

The Shift

Theory of
Success

Before any task executes, define two things: what done looks like to an observer, and the specific action that proves it.

DimensionActivity-BasedFlywheel
Done means “Did you do the thing?” “Can you prove it works?”
Verification Tests pass Evidence closes the loop
Agent output Narration of steps taken PROVEN + raw evidence
NFR concerns Caught in production Surfaced at plan time
Proof effort Same for every task Goldilocks — calibrated to complexity

Are we manufacturing success, or managing decline? Activity without evidence is decline dressed up as progress.

— Flywheel Philosophy
The Workflow

Four phases, one ratchet

Forward progression is deterministic. Backward routing is free — any failure escalates directly to the right level.

1

/flywheel-design

Design the thing. Define what overall success looks like. Write the design document with the overall Theory of Success.

2

/flywheel-plan

Break into tasks. Each task gets a Theory of Success, a specific proof action, and a lightweight NFR scan (security, privacy, performance).

3

/flywheel-execute

Convergence loops. Implementer builds + proves. Verifier evaluates evidence against Theory of Success. Loop until VERIFIED or escalate.

4

/flywheel-ship

Acceptance gate — system-level proof. Then cleanup, commit with evidence summary, push or open a PR.

Backward: RETRY (re-run task) · REPLAN (fix the plan) · RETHINK (fix the design)
Architecture

Four specialized agents

🎨

Brainstormer

Facilitates design refinement. Writes the design document after conversational validation.

📋

Planner

Creates task plans with Theory of Success + NFR scan per task. Implementation-level detail.

🛠️

Implementer

Builds the thing, runs the proof action, returns evidence. Evidence is the deliverable — not the code.

🔍

Verifier

Evaluates evidence against Theory of Success. Uses the Goldilocks rubric. Not a code reviewer.

The implementer returns PROVEN + raw evidence, not a story about what it built. The verifier returns VERIFIED, not a lengthy analysis. Token cost proportional to results, not effort.

Core Innovation

Evidence beats assertions

The Evidence Hierarchy

  • A curl showing the response body — proves the endpoint works
  • A screenshot of the UI state — proves the render is correct
  • A log grep showing the expected line — proves the process ran
  • A schema describe showing a new column — proves the migration ran
  • An existing test suite still passing — proves nothing was broken

Goldilocks Verification

  • UI change: screenshot of affected state, not a full automation suite
  • API endpoint: curl with status + body, not a load test
  • Config change: grep of the changed key, not a unit test of config parsing
  • Refactor: existing tests still pass, not re-implementation
  • DB migration: row count + schema query, not a full integrity check

The goal is confidence, not certainty. Perfect verification is the enemy of shipping. A config change doesn't need a curl. An API endpoint doesn't need a load test. Calibrate.

Architect Mindset

NFR scan at plan time,
not in production

Every task gets a lightweight scan — 2–3 lines identifying which non-functional concerns apply and what “good enough” means.

Security

“Validate JWT signature, not just decode. Check expiry. Reject tampered tokens.”

Privacy

“No PII in token payload or application logs. Minimal data collection.”

Performance

“No DB call per request — validation is stateless. Cache token issuer config.”

Resource Contention

“File writes use atomic rename. No lock held across network call.”

Reliability

“Retry with backoff on transient errors. Fail closed on auth — don't allow through.”

Plan-time Cost

Minutes to address. Production-time cost: hours to debug, days to fix properly. The scan prevents the obvious thing you'd miss.

Get Started

Two platforms, one methodology

Flywheel works with Amplifier bundles and Claude Code — same philosophy, native integration for each.

Amplifier

# Add to your bundle.md includes: - bundle: git+https://github.com/kenotron-ms/flywheel@main

4 modes, 4 agents, 6 skills — composable onto any bundle. Includes mode-to-mode transitions and agent delegation.

Claude Code

# Install to project npx flywheel-claude-code # Or install globally npx flywheel-claude-code --global

Installs skills into .claude/skills/ for native Claude Code integration. Same 4-phase workflow.

The Bundle

Flywheel at a glance

4
Specialized Agents
4
Interactive Modes
6
Discoverable Skills
39
Files in Bundle

Pure Methodology

No runtime dependencies. No package to install (for Amplifier). Pure markdown, YAML, and philosophy — composable onto any bundle.

Superpowers Alternative

Built as a complete replacement for the Superpowers methodology. Same daily-driver use case, different epistemology: evidence loops over activity checklists.

Built to Compose

Includes amplifier-foundation and amplifier-bundle-modes. Bring your own tools — flywheel adds the methodology layer.

Sources & Methodology

How this deck was built

Repository: kenotron-ms/flywheel on GitHub

Primary contributor: kenotron-ms (Ken) — 12 commits, sole contributor

Created: April 18, 2026 · Last pushed: April 19, 2026

Version: 0.1.0 · License: MIT


Data sources (all data verified from real artifacts):


Bundle stats: 39 files total — 32 Markdown, 1 YAML behavior, 2 shell test scripts, 1 JS installer, 1 JSON package manifest, 1 Dockerfile, 1 .gitignore. ~6,051 lines across all source files.


Methodology: Repository cloned and inspected directly. All statistics from git log, find, and wc. All quotes from source files. No data fabricated.

More Amplifier Stories