Less Context,
More Memory

dev-memory V2 cut startup context 93%

The dev-memory bundle

To remember across sessions, an AI first has to be taught how — and that lesson isn't free

dev-memory gives an AI persistent memory across sessions by teaching it how to use that memory up front. But those teaching instructions are context, and context has a price.

So how big was that price in V1?

V1 startup tax

V1 auto-loaded 4 instruction files into every session before you said hello

738 lines — the author's ~7K tokens — loaded at startup. auto-context.md states it is "included in every session," so that cost was paid before the first user turn.

And that was just the cost to begin.

4

context files auto-loaded

738

lines (160 + 89 + 310 + 179)

~7K

tokens at startup (author's figure)

0

user turns taken yet

The complication

The instructions cost more than most conversations — and threatened to grow with memory

A fixed ~7K-token bill paid up front is bad enough. Worse, V1's read guidance implied a cost that grew as the memory store grew, so the model didn't just start expensive — it threatened to get worse at scale.

V2's fix came in two moves.

Move 1 — compress

V2 replaced all four files with a single 37-line contract that just states the rules

One file, memory-instructions.md, now loads at startup instead of four — the author's ~500 tokens. It states the rules and nothing more.

But shrinking the contract alone doesn't fix growth.

4 files

738 lines · ~7K tokens (V1)

→

1 file

37 lines · ~500 tokens (V2)

Move 2 — delegate

Reads never touch the memory store directly — they delegate to a sub-agent

The contract forbids reading memory-store.yaml directly. Instead it delegates to the memory-retrieval sub-agent, which absorbs the full file in its own context and returns only matches (~200 tokens).

That's a context sink — and it changes the math.

1 Main sessionNever uses read_file on memory-store.yaml
2 DelegateCalls the memory-retrieval sub-agent
3 Sub-agent absorbsLoads the full memory file in its own context
4 Returns matches only~200 tokens back to the main session

The pivot

Because the main session only ever loads the fixed contract, adding memories no longer adds startup cost

The growing memory-store.yaml is only ever read inside the sub-agent's separate context. The main session sees just the 37-line contract — so the growth problem isn't made smaller, it's designed away.

Which lands the payoff.

The payoff

Startup dropped ~93% — and stays constant even at 1000+ memories

~7K → ~500 tokens is a ~93% cut. And because the fixed contract is all the main session loads, that cost holds constant as memory scales. Less context loaded, more memory available.

A real, shipped change — here's what carried the win.

93%

startup cost cut (~7K → ~500 tokens)

1000+

memories at constant startup cost

What to keep

Shipped as v2.0.0 — the win came from where cost is paid, not from doing less

Same 3 operations as before: delegate reads, append writes, direct work-log reads. Nothing was removed — the growing cost was just moved into a sub-agent's context. That's the generalizable pattern.

Delegate growth to a sub-agent; keep the main session's cost fixed.

# commit 69c3629 (HEAD of master, 2026-04-13)
# behaviors/dev-memory.yaml
version: 1.0.0  →  version: 2.0.0

# context.include: 4 files → 1 file
- dev-memory:context/memory-instructions.md

# Same 3 operations:
#   delegate reads · append writes
#   direct work-log reads

Sources

Sources & Research Methodology

Shipped · v2.0.0

Source repo: ramparte/amplifier-collection-dev-memory (confirmed via git remote -v). HEAD commit 69c3629 "feat: compact context from ~7K to ~500 tokens (v2.0)", 2026-04-13.

Research performed:

V1 file list: git show 0a9529d:behaviors/dev-memory.yaml — 4 context.include files
V1 line counts: git show 0a9529d:context/$f | wc -l — 160 + 89 + 310 + 179 = 738 lines
V2 contract: wc -l context/memory-instructions.md — 37 lines (single file)
Delegation & ops: cat context/memory-instructions.md; cat agents/memory-retrieval.md
Constant-cost claim: grep -rin constant bundle.md — "Constant token usage even with 1000+ memories"
Version bump: git show 69c3629 -- behaviors/dev-memory.yaml — 1.0.0 → 2.0.0
Contributors: git log --format='%an <%ae>' | sort | uniq -c

Gaps / estimates: The ~7K (V1) and ~500 (V2) token figures are the author's stated estimates from the commit message and behavior description, not independently reproducible token counts; a naive chars/4 estimate gives V1 ~4,300 and V2 ~270 tokens. The ~93% ratio holds under both. The ~200-token per-read figure is stated in the source files, not measured here. The commit rounds the V2 file to "~40 lines"; measured is 37.

Primary contributor: Sam Schillace — sole author (by name) on all 4 commits, including the v2.0 compression commit 69c3629.

Less Context,More Memory