Showcase · Personal AI

Your Voice,
Amplified

SamVoice: an AI that learns exactly how you write — then helps you write more like yourself.

Active · Voice-Matched Writing
May 2026 · samschillace/samvoice
The Problem

AI writes like AI,
not like you

🤖

Generic Voice

AI-generated text sounds the same no matter who prompted it. “Transformative,” “it’s worth noting,” “let’s dive in” — none of that is you.

✍️

Voice Is Invisible

You know your voice when you see it, but can you describe it? Most writers can’t articulate why their writing sounds like them.

🔍

No Feedback Loop

There is no tool that tells you “this paragraph drifted from your voice” or scores how well a draft matches your actual writing patterns.

In 130,000 words of published essays, “However” appears exactly 3 times. Semicolons: 0.2 per thousand words. But every AI draft is littered with both. The signal is in the data — if you measure it.

The Foundation

Computational analysis of a
real writing corpus

130K
Words analyzed
226
Published essays
6,724
Training examples
10
Violation types

Five years of Sunday Letters (2021–2025). 229 blog essays. 50 letters. Every word counted, every punctuation mark mapped, every transition cataloged.

Voice DNA

A punctuation fingerprint
is as unique as a thumbprint

Per-thousand-word frequencies extracted from the corpus. This is what makes a voice measurable, not just describable.

Mark Freq / 1K words Signature
Scare quotes (“x”) 10.6 THE defining punctuation — interrogates terms constantly
Parenthetical asides 7.0 Thinks out loud — full clauses in parentheses
Question marks 3.7 Direct questions to the reader, mid-paragraph
“But” transitions 3.2 The engine of the prose — never “However”
Semicolons 0.2 Near zero. This alone flags AI-generated text.

Contractions are NOT universal: “it is” appears 79 times vs “it’s” 60 times. “That is” 96 times vs “that’s” 17. The mix is deliberate rhythm, not inconsistency.

The System

Two tracks, one goal:
write like yourself

Voice Prompt + Retrieval

No GPU needed. A data-derived voice prompt encodes the precise punctuation fingerprint, vocabulary patterns, and anti-patterns. A retrieval tool pulls topic-relevant exemplar paragraphs from the corpus to calibrate the frontier model.

  • write — generate a full draft with voice-matched exemplars
  • coach — paragraph-by-paragraph scoring with fix suggestions
  • find — retrieve the best corpus paragraphs for any topic

LoRA Fine-Tuned Models

GPU required. Two LoRA adapters on Qwen2.5-3B-Instruct, trained on the real corpus. One scores text, one rewrites it.

  • Classifier — scores 0.0–1.0 on voice match (6,724 examples)
  • Rewriter — transforms AI-generic text to voice-matched (907 pairs)
  • ModernBERT v3 — binary classifier at 100% accuracy on test set

The voice prompt approach is recommended for daily use. The LoRA models are for batch processing and research — scoring entire books paragraph by paragraph.

Workflow

From topic to voice-matched draft

1

Index the Corpus

Load all 226 essays (~1,700 paragraphs). Score each paragraph on voice dimensions: informal quantifiers, hedging, parentheticals, scare quotes, transitions, sentence variety.

2

Retrieve Exemplars

For a given topic, find paragraphs ranked by (topic relevance × voice quality). Backfill with highest-voice-score paragraphs if not enough topical matches.

3

Build the Voice Prompt

Combine the quantitative fingerprint (word frequencies, punctuation ratios, sentence stats) with structural rules, anti-patterns, and the retrieved exemplars.

4

Generate & Coach

Send to Claude with voice-calibrated system prompt. Then score the result paragraph-by-paragraph — flagging formal transitions, AI slop, missing uncontracted forms, semicolons.

Training Innovation

Teaching a model what your
voice is not

Hard negatives: real author paragraphs with one specific voice dimension violated. Each teaches the classifier a precise boundary.

formal_transitions

“But” → “However” / “Moreover”

no_hedging

Removes “kind of”, “I think”, “I suspect”

ai_slop_injection

Injects “transformative”, “it’s worth noting”

semicolonitis

Adds semicolons (near-zero in real corpus)

no_parentheticals

Strips the signature parenthetical asides

no_scare_quotes

Removes interrogative scare quotes

tricolon_escalation

Adds “Not X. Not Y. Not Z.” speechwriter patterns

over_contracted

Contracts everything (Sam leaves some uncontracted)

Each violation type has a calibrated label (0.25–0.55) reflecting how far it drifts from authentic voice. 300+ hard negatives in v3 training data.

The Bigger Picture

Voice Packs:
portable identity

A voice pack is the interface contract between general-purpose writing tools and a specific voice. Any voice — a person, a character, a brand — can be a voice pack.

📝

voice-prompt.md

Quantitative fingerprint — word frequencies, punctuation ratios

🎯

rubric-anchors.yaml

6 scoring dimensions — Open, Dense, Anti-OE, Voice, Close, Feel

🚫

anti-patterns.yaml

Signal phrases for pruning + densifier thresholds

📚

exemplars/

3–5 ground-truth samples of authentic writing

Pipelines Are Generic

Judge, rewrite, prune, densify — all work with any voice pack. The pipeline logic never changes.

Voice Packs Are Specific

What “good” sounds like, what to avoid, how to score. One pack per voice, portable across tools.

7,769 Lines of Python

11 scripts + 7 pipeline modules. Judge, rewrite, batch, prune, densify, cross-validate, and training.

Sources & Methodology

How this deck was built

Repository: samvoice/ at /home/samschillace/dev/ANext/samvoice

Files examined: README.md, VOICE-PACK-SPEC.md, context/VOICE_PROMPT.md, context/SAMVOICE.md, voice-pack/manifest.yaml, voice-pack/rubric-anchors.yaml, voice-pack/writing-patterns.md, scripts/voice_tool.py, scripts/generate_hard_negatives.py. Git log for commit counts and contributor data. Line counts via wc -l.

All numbers are from the repository. No estimates or projections.

The Point

Your voice is
in the data

130,000 words contain a precise, measurable, reproducible fingerprint. Not a vague style guide — a quantitative signature that a machine can learn and a human can verify.

# Write a draft in your voice python scripts/voice_tool.py write "AI is making knowledge work trivially easy" # Coach an existing draft python scripts/voice_tool.py coach my-draft.md # Find your best paragraphs on a topic python scripts/voice_tool.py find "user laziness and 10x improvements"
samschillace/samvoice
Python · Claude API · Qwen 3B + LoRA · ModernBERT · DGX Spark
More Amplifier Stories