Features · Dev Memory

93% Compression

When less context means more memory. A persistent memory system that got radically better by loading less.

V2.0 · ~7,000 → ~500 tokens
May 2026 · Sam Schillace
The Problem

7,000 tokens
before you said hello

V1 loaded four context files into every session to teach the AI how its memory works. The instructions were longer than most conversations.

📜

738 Lines of Instructions

Four files: delegation rules, auto-context, writer instructions, and a system guide — all loaded at session start, every time.

💸

~7,000 Tokens Burned

Before the user typed a single word, the memory system had already consumed 7K tokens of the AI's context window. Every session.

🔃

Redundant Content

The same "never read memory-store.yaml directly" warning appeared in four different files. The AI got the message — four times over.

The memory system's instructions about being token-efficient were themselves the biggest source of token waste.

The Insight

What if the AI
already knows?

The V1 files spent hundreds of tokens explaining things modern LLMs already understand: YAML formatting, bash append operations, pattern matching. The redesign asked a different question.

V1: Teach Everything

  • Full bash examples for append operations
  • YAML formatting guidelines with templates
  • Incorrect usage examples ("don't do this")
  • Self-check rubrics and decision trees
  • Category auto-detection keyword lists
  • Error handling procedures

V2: State the Contract

  • Three operations, three rules
  • Writes: append to file, never load it
  • Reads: delegate to sub-agent, always
  • Work status: read small file directly
  • Trust the LLM for implementation
  • 37 lines. Done.

The best context is the minimum context that produces correct behavior. Everything else is overhead.

The Change

One commit, one behavior file

The entire V2 redesign was a single commit that replaced four context includes with one.

context: include: - - dev-memory:context/DELEGATION-RULES.md - - dev-memory:context/auto-context.md - - dev-memory:context/memory-writer-instructions.md - - dev-memory:context/memory-system-guide.md + - dev-memory:context/memory-instructions.md

Before (V1)

4 files · 738 lines · ~7,000 tokens
DELEGATION-RULES.md (160 lines)
auto-context.md (89 lines)
memory-writer-instructions.md (310 lines)
memory-system-guide.md (179 lines)

After (V2)

1 file · 37 lines · ~500 tokens
memory-instructions.md — three operations,
three patterns, zero redundancy

Commit: 69c3629 — "feat: compact context from ~7K to ~500 tokens (v2.0)"
Architecture

The context sink pattern

Memory reads go to a sub-agent. The sub-agent absorbs the token cost of loading large files and returns only matches. The main session stays lean.

W

Writes: Main Agent, Append-Only

cat >> memory-store.yaml — read the last few lines for the next ID, append the new entry. Never load the full file. Cost: ~100 tokens

R

Reads: Delegated to Sub-Agent

The memory-retrieval agent loads the full memory store in its own isolated context, searches, and returns only 2–3 matching entries. Cost in main session: ~200 tokens

S

Work Status: Direct Read

work-log.yaml is a small file (~500 tokens). Safe to read directly — no delegation needed.

The sub-agent is a "context sink" — it absorbs 10,000+ tokens of memory data, but only 200 tokens flow back to the main session. When the sub-agent finishes, its context is cleared.

Token Efficiency

Constant cost at any scale

Whether you have 10 memories or 10,000, the token cost in the main session stays flat.

93%
Context
compression
37
Lines of
instructions
~500
Tokens at
session start
1
Commit to
ship V2
V1
~7,000 tokens
V2
~500
Token estimates from commit message: "Replace 4 context files (738 lines, ~7K tokens) with single compact memory-instructions.md (~40 lines, ~500 tokens)"
V2 Design Spec

Eight improvements beyond compression

The compression was step one. The V2 design spec, authored April 2026, lays out a complete redesign inspired by Claude Code's memory architecture.

3-Layer Index

Index → topic files → session transcripts. Only the index is always loaded.

autoDream

Background consolidation agent: merge, dedup, prune stale, resolve contradictions.

Temporal Tags

[temporal:DATE] and [persistent] tags. Staleness as a first-class concept.

Derivability Rule

"If you can grep for it, don't remember it." Only store non-derivable knowledge.

Contradiction Check

Write path checks new memories against the index. Conflicts presented to user before storing.

Cross-References

see-also: links between topics. Maintained by consolidation, not the write path.

Smoke Tests

Retrieval regression suite in .meta/retrieval-test.yaml. Run after every consolidation.

Git Safety Net

Pre/post-consolidation commits. If autoDream corrupts data, git revert restores it.

From docs/specs/2026-04-07-memory-v2-design.md (342 lines)
Deep Dive

Memory as index, not storage

The V2 architecture replaces a flat YAML file with a 3-layer design. Each layer is loaded only when needed, and each is cheaper than the last.

L0

MEMORY.md — The Index (always loaded)

~200 lines max, hard-capped. One-liner summaries with file pointers to topic files. This is the only file the retrieval agent always reads.

L1

topics/*.md — Topic Files (on-demand)

5–50 entries per theme. Free-form markdown, not YAML. Only opened when a search hits the relevant index line. The unit of consolidation.

L2

Session Transcripts — Layer 2 (never loaded)

Amplifier's existing session logs. Never read directly — only searched via session-analyst delegation when Layers 0 and 1 don't have the answer.

The hard cap on index lines (200) is enforced by consolidation. When the index grows past the cap, consolidation must run and prune. The vector search seam activates when smoke tests show retrieval misses at ~50+ topic files — not before.

Sources & Methodology

How we verified this

Repository: amplifier-collection-dev-memory (private, ramparte/amplifier-collection-dev-memory)

Compression claim verification:

Git history (last 60 days, non-merge):

All commits (total): 4 commits across the full repo history

Key files examined:

Primary contributor: Sam Schillace (all commits authored)

Methodology: All metrics derived from git log, git diff, wc -l, and direct file inspection. Token counts are the repository's own estimates from commit messages and behavior YAML descriptions. No external tokenizer was used. The 93% figure is the mathematical result of the repo's stated ~7K → ~500 token reduction.

More Amplifier Stories