Amplifier Development Story
Escaping Dark Alleys
How process discipline and design investment transformed AI development
7 days · 18 sessions · 4 repositories · A new foundation for orchestration
January 28 – February 4, 2026
The Through-Line
Two Key Lessons
01
Process Discipline
Using rigorous validation, working memory, and phase gates prevents the AI from wandering down dark alleys
02
Design Investment
Putting effort into design early produces powerful primitives that solve a whole class of problems
1
Part 1
The Foreman Troubles
January 28 – February 2
When the AI went down dark alleys
Part 1 · The Foreman Troubles
The Vision
The Foreman Pattern
An AI session spawning and managing worker sub-sessions through an issue queue
Fire-and-Forget Workers
Spawn workers that complete tasks independently, no synchronous waiting required
Typed Worker Pools
Different worker types for different tasks: researcher, implementer, reviewer
Issue-Based Coordination
Workers pick up tasks from a queue, post results back. The queue is the protocol.
Part 1 · The Foreman Troubles
The Confidence Gap
Tests Said: All Good
13
Unit Tests Passing
Every test green. All paths exercised. Clean coverage.
100%
Code Coverage
All branches covered. HIGH confidence declared.
"The prototype looks solid. Time for manual testing..."
Part 1 · The Foreman Troubles
The Confidence Gap
Reality Said: Catastrophic Failure
grep -r "spawn" events.jsonl
# No spawn events. Nothing was started.
The AI claimed things worked without actually verifying them
Part 1 · The Foreman Troubles
What the User Saw
The Error Messages
ERROR: Required capability 'bundle.load' not available
ERROR: Session not found
# Workers spawned but couldn't load bundles
# Parent couldn't find child sessions
# Nothing worked outside the test environment
The Irony
The AI mocked bundle.load — a capability that doesn't exist. The real session.spawn was available the whole time.
The Gap
Tests ran against fantasy APIs. Manual testing hit reality. 100% coverage of code that couldn't work.
Part 1 · The Foreman Troubles
What Went Wrong
Down the Dark Alleys
These were AI practice failures, not missing primitives
- Mocked fake capabilities — Created
bundle.load and session.AmplifierSession that didn't exist. The real session.spawn was there the whole time.
- Didn't study reference implementations — The task tool showed the correct pattern. It was never consulted.
- Asserted without verifying — "Workers are running" when spawn events never appeared in logs
- Tests tested nothing — 13 tests, 100% coverage → all against mocked APIs that didn't match reality
Part 1 · The Foreman Troubles
Progressive Discovery
The Bug Cascade
Once manual testing started, each fix revealed the next layer.
Four bugs. Four assumptions. All wrong.
Part 1 · Bug Cascade
Bug 1 of 4
Status Vocabulary Mismatch
The Symptom
Workers completed their tasks, but the Foreman never saw them finish. Tasks stayed "in progress" forever.
The Root Cause
Workers posted status "completed". The tool expected "closed". The error was silently swallowed.
{ "status": "completed", "result": "Task done!" }
{ "status": "closed", ... }
Fix applied → Bug 2 discovered ↓
Part 1 · Bug Cascade
Bug 2 of 4
Session Storage Paths
The Symptom
Parent session couldn't find child sessions. "Session not found" errors everywhere.
The Root Cause
Project directory derived from cwd. Workers started in different directories → ended up in wrong session stores.
~/.amplifier/projects/spawn-events-work/sessions/parent-123/
~/.amplifier/projects/amplifier-core/sessions/worker-456/
Fix applied → Bug 3 discovered ↓
Part 1 · Bug Cascade
Bug 3 of 4
Tool Source Paths
The Symptom
Workers declared tools in their bundles but got "capability not available" at runtime.
The Root Cause
Git URLs missing #subdirectory= fragment. Tool resolver couldn't find the package inside the repo.
source: git+https://github.com/org/tools.git
source: git+https://github.com/org/tools.git#subdirectory=packages/my-tool
Fix applied → Bug 4 discovered ↓
Part 1 · Bug Cascade
Bug 4 of 4
Concurrent File Access
The Symptom
3 workers completed successfully. But only 1 result appeared in the output file.
The Root Cause
Multiple workers writing to the same file simultaneously. Last writer wins — overwrote the others.
Worker A: open("results.json", "w")
Worker B: open("results.json", "w")
Worker C: open("results.json", "w")
Worker A: write(result_a)
Worker B: write(result_b)
Worker C: write(result_c)
Part 1 · The Foreman Troubles
The Full Picture
Four Layers of Assumptions
1
Status Vocabulary
"completed" vs "closed" — silently ignored
↓ fix → discover
2
Session Storage
Wrong directories — parent couldn't find children
↓ fix → discover
3
Tool Source Paths
Missing #subdirectory — tools didn't load
↓ fix → discover
4
Race Conditions
Concurrent writes — results overwritten
Every assumption the AI made turned out to be wrong
Part 1 · The Foreman Troubles
The Breaking Point
"You assured me your testing strategy confirmed it would work with a high-degree of confidence."
The primitives existed. The process was broken.
The AI went down dark alleys — making assumptions, mocking fake APIs,
claiming things worked without verifying.
Part 1 · The Foreman Troubles
The Lesson
The Capabilities Were There
The problem was unvalidated AI assumptions.
session.spawn existed the whole time. The AI just never looked.
This wasn't a tooling problem. It was a process problem.
2
Part 2
Rigorous Design Validation
February 3
Understand first. Design second. Implement last.
Part 2 · Rigorous Design Validation
The Opposite of Part 1
Context First, Design Second
The design process started by studying the existing system
"Check out the following... and let's talk about the overall design of sessions spawning sessions"
amplifier-core/amplifier_core/session.py
amplifier-foundation/modules/tool-delegate/
amplifier-app-cli/session_runner.py, session_spawner.py
amplifier-bundle-observers/
amplifier-bundle-foreman/
In Part 1, the AI assumed and mocked. In Part 2, we looked first.
Part 2 · Rigorous Design Validation
The Difference
Two Approaches
Part 1: Dark Alley
Jumped into implementation
• Assumed bundle.load existed
• Mocked session.AmplifierSession
• Never studied tool-delegate
• Never looked at session.spawn
Part 2: Lit Path
Built context before designing
• Studied how sessions actually work
• Read the existing delegate tool
• Understood the CLI spawner
• Learned from foreman's failures
Same goal. Opposite process. Different outcome.
Part 2 · Rigorous Design Validation
The Shift
Trust But Verify
Before: Dark Alley
Design document read as "current architecture" — assumed it existed without checking code
After: Lit Path
Every proposed primitive validated against existing code. Gaps identified before implementation started.
grep -r "spawn_bundle" amplifier-core/src/
# No results - doesn't exist yet
grep -r "EventRouter" amplifier-core/src/
# No results - needs to be built
grep -r "TriggerSource" amplifier-core/src/
# No results - proposed, not implemented
Part 2 · Rigorous Design Validation
The Breakthrough
The Aha Moment
"What if agents were never a separate concept? They're just inline bundle definitions that inherit heavily from the parent session."
Agent spawning IS bundle spawning — just with different inheritance levels.
| Inheritance |
What You Get |
What You Override |
| Full Bundle |
Nothing |
Everything — define it all yourself |
| Agent |
Provider, settings, most context |
Tools, specific context, instructions |
| Self |
Everything |
Just fork — new instance, same config |
Part 2 · Rigorous Design Validation
The Architecture
Validated Design
Primitives that solve multi-agent orchestration generally, not just Foreman
spawn_bundle()
Unified spawning function. One primitive for all inheritance levels.
EventRouter
Cross-session event communication. Sessions can emit and listen.
TriggerSource
Reactive triggers protocol. Timer, SessionEvent, Manual.
BackgroundSessionManager
Lifecycle management. Health checks. Graceful shutdown.
Design investment produced primitives that solve a whole class of problems
3
Part 3
Implementation Infrastructure
February 3–4
Guardrails that prevent dark alleys
Part 3 · Implementation Infrastructure
The Challenge
How to Work
Without Wandering?
"How should I start this implementation process? What's the best way to ensure it requires as little intervention from me as possible until it is fully done?"
After the Foreman troubles, confidence was needed that the AI could work autonomously without going down dark alleys.
Part 3 · Implementation Infrastructure
The Methodology
Four Pillars of
Process Discipline
AGENTS.md
Auto-loaded context for any session in the directory. No re-explaining the project. The AI always knows the mission.
Working Memory
Session state "save game" that persists. AI knows where it left off. No context loss between sessions.
PHASE-GATES.md
Checkbox tracking with explicit criteria. AI knows what "done" means. No subjective judgment calls.
Validation Recipes
Automated PASS/FAIL verification. No "looks good" — objective gates. Trust is earned by evidence.
Part 3 · Implementation Infrastructure
The Infrastructure
The Workspace Created
├── .amplifier/
│ ├── AGENTS.md
│ └── working-memory/
│ └── spawn-events-impl.md
├── docs/
│ ├── PHASE-GATES.md
│ └── integrated-design-impl.md
├── recipes/
│ └── validate-phase-*.yaml
├── amplifier-core/
└── amplifier-foundation/
Part 3 · Implementation Infrastructure
The Handoff
"I'll be away for awhile, so make sure you keep making progress so it will be completed and validated when I return."
The infrastructure enabled what came next:
The AI worked through ALL phases autonomously —
but this time with guardrails that prevented dark alleys.
4
Part 4
The Result
February 4
What process discipline + design investment produced
Part 4 · The Result
Autonomous Execution
The Implementation Sprint
5 phases executed autonomously — no dark alleys
Phase 1
SessionStorage Protocol — Abstract storage layer for session relationships
Phase 2
spawn_bundle() — Core spawning function with inheritance levels
Phase 3
EventRouter — Cross-session event communication
Phase 4
TriggerSource Protocol — Timer, SessionEvent, Manual triggers
Phase 5
BackgroundSessionManager — Lifecycle, health checks, shutdown
All phases completed — implementing, testing, fixing, committing —
without human intervention.
Part 4 · The Result
What We Built
New Primitives Delivered
spawn_bundle()
The unified spawning function. Creates child sessions with configurable inheritance.
SessionStorage Protocol
Abstract storage layer. Parent can find children. Children can find parent.
EventRouter
Cross-session communication. Sessions can emit events, others can listen.
TriggerSource Protocol
TimerTrigger, SessionEventTrigger, ManualTrigger. Fire sessions on conditions.
BackgroundSessionManager
Manages lifecycle of spawned sessions. Health checks. Graceful shutdown.
W3C Trace Context
Full trace lineage across parent-child-grandchild. Observability built in.
Part 4 · The Result
The Numbers
What We Shipped
Part 4 · The Result
The Bigger Picture
From Catalyst
to General Infrastructure
The Foreman was just the catalyst.
These primitives now enable any multi-agent pattern.
Foreman + Workers
Manager spawns specialized workers via issue queue
Pipeline Stages
Sequential handoff with event-triggered transitions
Swarm Coordination
Many agents collaborating via EventRouter
The Transformation
Dark Alleys vs Lit Paths
Part 1: Dark Alleys
- Mocked fake APIs without checking reality
- Asserted "it works" without verification
- 100% test coverage of fantasies
- Catastrophic failure on manual test
Parts 2–4: Lit Paths
- Validated every design claim against code
- AGENTS.md + Working Memory for context
- Phase gates with objective criteria
- Autonomous completion — no intervention
The Two Lessons
What We Learned
01
Process Discipline Prevents Dark Alleys
AGENTS.md, working memory, phase gates, and validation recipes kept the AI on track. The infrastructure matters as much as the code.
02
Design Investment Pays Off
Rigorous validation produced primitives that solve a whole class of problems. Foreman was one pattern — the infrastructure enables all of them.