Not Me

Making a writing voice measurable

The Frame

AI writes fluently — but in a voice you instantly know is “not me.”

You can’t describe what’s wrong, but you feel it. SamVoice turns that gut feeling into a measurement — a computational fingerprint of a writer’s real corpus — and feeds it back into tools that write and coach.

The claim: a “voice” can actually be measured. Can it?

The Proof

Sam’s voice reduces to a reproducible punctuation fingerprint.

Measured across 136,820 words of real essays: near-zero semicolons — just 22, or 0.16 per 1,000 words — independently re-counted straight from the corpus.

A voice really is measurable. But a fingerprint alone isn’t the whole picture.

0.16

semicolons / 1k words (22 in 136,820)

~9.5

scare quotes / 1k — the signature mark

4.1

question marks / 1k words

7.5

parentheticals / 1k words

The Setup

The fingerprint sits beside a 6-dimension rubric built from the same essays.

Beyond punctuation, the voice is scored on six named dimensions. A cross-validation harness treats a total of ≥ 22 / 30 (~73%) as “Sam positive.”

Now there’s a full picture of the voice. So how does it get fed back?

Open
Dense
Anti-OE
Voice
Close
Feel

The Mechanism · 1 of 2

`voice_tool.py write` generates in the voice, not near it.

The write mode feeds a frontier model (claude-sonnet-4) the VOICE_PROMPT.md fingerprint plus real exemplars retrieved from the corpus — so the model imitates the actual author.

The tool writes in the voice. But what about text you’ve already written?

# three modes in a 410-line CLI
voice_tool.py write  # API + exemplars
voice_tool.py coach  # score paragraphs
voice_tool.py find   # retrieve exemplars

# build_system_prompt() reads
# VOICE_PROMPT.md + retrieved
# real-corpus exemplars

The Mechanism · 2 of 2

`coach` turns the “not me” feeling into a specific fix list.

The coach mode scores writing paragraph-by-paragraph against named voice dimensions and penalizes AI tells — anti_semicolons, anti_ai (delve, myriad, tapestry…), and forced tricolons.

Human-readable rules coach you toward the voice. Now make the machine’s judgment robust.

Scores informal_quantifiers, hedging, parentheticals, scare_quotes, questions, length_variety
anti_semicolons = −count(';') × 2
anti_ai penalizes AI-tell words × 3
anti_tricolon flags “Not X. Not Y. Not Z.”

The Turn

To sharpen the machine’s ear, it’s trained on deliberate mistakes.

A hard-negative generator corrupts one voice dimension at a time — 10 typed violations — to train a ModernBERT classifier, cross-validated against a 6-dimension LLM judge.

The measured voice is now a trainable machine judge. So what does that unlock?

10

typed violation types, each a float label

2,509

rows in hard_negatives_v4.jsonl

4,473

rows in classifier_v4_balanced_train.jsonl

1,182

lines in cross_validate.py (BERT vs judge)

The Payoff

The Voice Pack makes the whole machine work for ANY voice.

Spec v1.0 (2026-04-20) makes the pipelines generic and the voice a swappable definition — fingerprint, patterns, rubric anchors, exemplars, classifier. All pipelines now accept --voice-pack.

A private experiment becomes a reusable machine for measuring and amplifying any voice.

voice-prompt.md
writing-patterns.md
rubric-anchors.yaml
anti-patterns.yaml
manifest.yaml + ≥1 exemplar

The Takeaway

“Not me” is a signal, not a shrug.

Once a voice is a measurable, swappable pack — fingerprint, rubric, exemplars, classifier — amplifying your own voice stops being a mystery and becomes an engineering problem.

Sources

Research Methodology

Data as of: 2026-07-21 — metrics independently re-derived from the samvoice repo and corpus, not taken from the README on faith.

Feature status: Research / single-author tooling repo — all 12 commits by Sam Schillace.

Commands run:

cat corpus/blog/*.md | wc -w → 136,820 words; ls corpus/blog/*.md | grep -v _index | wc -l → 228 essays
grep -o ';' | wc -l → 22 semicolons (0.16 / 1k); regex counts reproduced question marks 4.1/1k, parentheticals 7.5/1k, scare quotes ~9.5/1k
wc -l scripts/voice_tool.py → 410 lines (write / coach / find); sed -n '80,150p' for the scoring dimensions
sed -n '44,140p' scripts/generate_hard_negatives.py → 10 typed violations; for f in data/*.jsonl; do wc -l $f; done for dataset counts
wc -l scripts/cross_validate.py → 1,182 lines; DIMENSIONS = Open/Dense/Anti-OE/Voice/Close/Feel, threshold ≥22/30
head -60 VOICE-PACK-SPEC.md → v1.0 (2026-04-20); git log --oneline → commit 2fd350f parameterizes pipelines with --voice-pack

Primary contributor: Sam Schillace (12 of 12 commits; git email is a placeholder, not a real address).

Gaps / not independently verified: README/VOICE_PROMPT/manifest word & essay counts vary by snapshot (130,741–150,000 words; 226–276 documents) — direct counts used here. Classifier accuracy 1.0 is a repo self-report only (ModernBERT weights served from spark-1, not in git). The frontier write model is claude-sonnet-4-20250514 per voice_tool.py.