When less context means more memory. A persistent memory system that got radically better by loading less.
V1 loaded four context files into every session to teach the AI how its memory works. The instructions were longer than most conversations.
Four files: delegation rules, auto-context, writer instructions, and a system guide — all loaded at session start, every time.
Before the user typed a single word, the memory system had already consumed 7K tokens of the AI's context window. Every session.
The same "never read memory-store.yaml directly" warning appeared in four different files. The AI got the message — four times over.
The memory system's instructions about being token-efficient were themselves the biggest source of token waste.
The V1 files spent hundreds of tokens explaining things modern LLMs already understand: YAML formatting, bash append operations, pattern matching. The redesign asked a different question.
The best context is the minimum context that produces correct behavior. Everything else is overhead.
The entire V2 redesign was a single commit that replaced four context includes with one.
4 files · 738 lines · ~7,000 tokens
DELEGATION-RULES.md (160 lines)
auto-context.md (89 lines)
memory-writer-instructions.md (310 lines)
memory-system-guide.md (179 lines)
1 file · 37 lines · ~500 tokens
memory-instructions.md — three operations,
three patterns, zero redundancy
69c3629 — "feat: compact context from ~7K to ~500 tokens (v2.0)"Memory reads go to a sub-agent. The sub-agent absorbs the token cost of loading large files and returns only matches. The main session stays lean.
cat >> memory-store.yaml — read the last few lines for the next ID, append the new entry. Never load the full file. Cost: ~100 tokens
The memory-retrieval agent loads the full memory store in its own isolated context, searches, and returns only 2–3 matching entries. Cost in main session: ~200 tokens
work-log.yaml is a small file (~500 tokens). Safe to read directly — no delegation needed.
The sub-agent is a "context sink" — it absorbs 10,000+ tokens of memory data, but only 200 tokens flow back to the main session. When the sub-agent finishes, its context is cleared.
Whether you have 10 memories or 10,000, the token cost in the main session stays flat.
The compression was step one. The V2 design spec, authored April 2026, lays out a complete redesign inspired by Claude Code's memory architecture.
Index → topic files → session transcripts. Only the index is always loaded.
Background consolidation agent: merge, dedup, prune stale, resolve contradictions.
[temporal:DATE] and [persistent] tags. Staleness as a first-class concept.
"If you can grep for it, don't remember it." Only store non-derivable knowledge.
Write path checks new memories against the index. Conflicts presented to user before storing.
see-also: links between topics. Maintained by consolidation, not the write path.
Retrieval regression suite in .meta/retrieval-test.yaml. Run after every consolidation.
Pre/post-consolidation commits. If autoDream corrupts data, git revert restores it.
The V2 architecture replaces a flat YAML file with a 3-layer design. Each layer is loaded only when needed, and each is cheaper than the last.
~200 lines max, hard-capped. One-liner summaries with file pointers to topic files. This is the only file the retrieval agent always reads.
5–50 entries per theme. Free-form markdown, not YAML. Only opened when a search hits the relevant index line. The unit of consolidation.
Amplifier's existing session logs. Never read directly — only searched via session-analyst delegation when Layers 0 and 1 don't have the answer.
The hard cap on index lines (200) is enforced by consolidation. When the index grows past the cap, consolidation must run and prune. The vector search seam activates when smoke tests show retrieval misses at ~50+ topic files — not before.
Repository: amplifier-collection-dev-memory (private, ramparte/amplifier-collection-dev-memory)
Compression claim verification:
69c3629 message: "Replace 4 context files (738 lines, ~7K tokens) with single compact memory-instructions.md (~40 lines, ~500 tokens)"git diff 69c3629^..69c3629 -- behaviors/dev-memory.yaml confirms V1 loaded 4 files, V2 loads 1wc -l)wc -l)Git history (last 60 days, non-merge):
69c3629 — feat: compact context from ~7K to ~500 tokens (v2.0)0a9529d — docs: add 8 concrete improvements to Memory V2 design specf04ac82 — Add Memory V2 design specAll commits (total): 4 commits across the full repo history
Key files examined:
behaviors/dev-memory.yaml — Bundle behavior definition (version 2.0.0)context/memory-instructions.md — V2 compact context (37 lines)context/auto-context.md, context/DELEGATION-RULES.md, context/memory-writer-instructions.md, context/memory-system-guide.md — V1 context files (retained but no longer loaded)agents/memory-retrieval.md — Sub-agent definition (142 lines)docs/specs/2026-04-07-memory-v2-design.md — V2 design specification (342 lines)ARCHITECTURE.md — Token-efficient architecture documentation (290 lines)Primary contributor: Sam Schillace (all commits authored)
Methodology: All metrics derived from git log, git diff, wc -l, and direct file inspection. Token counts are the repository's own estimates from commit messages and behavior YAML descriptions. No external tokenizer was used. The 93% figure is the mathematical result of the repo's stated ~7K → ~500 token reduction.