Shadow Environments¶
You've made a change to a core library. The unit tests pass. But will the six downstream packages that depend on it still work? You could push and wait for CI, but that's a 15-minute feedback loop -- and if it breaks, you've broken everyone's builds. Shadow environments give you a faster answer: an OS-isolated sandbox where your local changes become the "real" repository, and you can test the full dependency chain without pushing anything.
What Is a Shadow Environment?¶
A shadow environment is an isolated container that sees your local working
tree as if it were the upstream repository. When code inside the shadow runs
pip install git+https://github.com/microsoft/amplifier-core, it doesn't
fetch from GitHub -- it fetches from a snapshot of your local checkout. Everything
else resolves normally.
Your machine: Shadow container:
┌──────────────────┐ ┌──────────────────────────┐
│ ~/repos/core │──snapshot─▶│ Embedded Gitea server │
│ (your changes) │ │ ↕ git URL rewriting │
└──────────────────┘ │ │
│ pip install git+https:// │
│ github.com/ms/core │
│ → resolves to Gitea │
│ → YOUR local code │
│ │
│ Everything else │
│ → real GitHub │
└──────────────────────────┘
The key insight: selective git URL rewriting. Only the URLs you specify get redirected. The container has full network access for everything else -- real dependencies, real package registries, real GitHub for repos you aren't overriding.
How It Works¶
Shadow environments use three components working together:
1. OS-Level Isolation¶
On Linux, shadows use bubblewrap
(bwrap) -- the same sandboxing technology used by Flatpak. On macOS, they
use sandbox-exec with a custom profile. Both provide:
- Filesystem isolation (the shadow can't modify your host files)
- Network namespace separation
- Dropped capabilities and no-new-privileges flags
- Security-hardened containers by default
2. Local Source Snapshots¶
When you create a shadow, it takes an exact working tree snapshot of your local repo -- including untracked files, uncommitted changes, and deleted files. This isn't just the latest commit; it's the exact state of your working directory. The snapshot is pushed to an embedded Gitea server running inside the container, preserving full git history so pinned commits still resolve.
3. Git URL Rewriting¶
Git is configured inside the container to rewrite only the specific GitHub URLs you specify to point at the embedded Gitea server. When any tool -- pip, uv, cargo, npm -- resolves a git dependency, the rewriting happens at the git transport layer. The tools don't know they're hitting a local server.
Using Shadows¶
Create¶
Create a shadow with one or more local source overrides:
# Single repo override
amplifier-shadow create --local ~/repos/amplifier-core:microsoft/amplifier-core
# Multiple repos in one shadow
amplifier-shadow create \
--local ~/repos/amplifier-core:microsoft/amplifier-core \
--local ~/repos/amplifier-foundation:microsoft/amplifier-foundation \
--local ~/repos/amplifier-app-cli:microsoft/amplifier-app-cli
The --local flag maps a local path to a GitHub org/repo identifier. Inside
the shadow, any git fetch to github.com/microsoft/amplifier-core resolves to
your local snapshot instead.
Execute¶
Run commands inside the shadow:
# Install a package (resolves to your local code)
amplifier-shadow exec <id> "uv pip install git+https://github.com/microsoft/amplifier-core"
# Run Amplifier itself against your changes
amplifier-shadow exec <id> "amplifier run 'Hello from shadow'"
# Run a test suite
amplifier-shadow exec <id> "cd /workspace && pytest tests/"
Inspect and Iterate¶
# See what changed inside the shadow
amplifier-shadow diff <id>
# Open an interactive shell
amplifier-shadow shell <id>
# Check shadow status and snapshot commits
amplifier-shadow status <id>
# Inject a file into the running shadow
amplifier-shadow inject <id> local-fix.py /workspace/src/fix.py
# Extract results from the shadow
amplifier-shadow extract <id> /workspace/test-results.xml ./results.xml
Clean Up¶
# Destroy a single shadow
amplifier-shadow destroy <id>
# List all active shadows
amplifier-shadow list
Verifying Your Code Is Used¶
The create and status commands return the exact commit hash captured from
your local repo. When pip or uv installs a package inside the shadow, the
resolved commit appears in the install output. Match those hashes to confirm
your local code is actually being used -- not a stale cached version.
I want to verify my local core changes are being picked up in the shadow.
[Tool: bash] amplifier-shadow status shadow-7f3a
Use Cases¶
Testing Core Changes¶
You've changed the Amplifier kernel's tool dispatch logic. Before pushing, you want to know if downstream bundles still load correctly:
amplifier-shadow create --local ~/repos/amplifier-core:microsoft/amplifier-core
amplifier-shadow exec <id> "uv tool install git+https://github.com/microsoft/amplifier"
amplifier-shadow exec <id> "amplifier run 'load foundation bundle and list tools'"
If the bundle loads and tools fire, your change is compatible. If not, you caught the regression before CI did.
Multi-Repo Integration¶
You're changing a data model in amplifier-core and updating the CLI in
amplifier-app-cli to match. These changes must ship together:
amplifier-shadow create \
--local ~/repos/amplifier-core:microsoft/amplifier-core \
--local ~/repos/amplifier-app-cli:microsoft/amplifier-app-cli
amplifier-shadow exec <id> "uv tool install git+https://github.com/microsoft/amplifier"
amplifier-shadow exec <id> "amplifier --version && amplifier run 'smoke test'"
Both repos resolve to your local snapshots. You're testing the combination of changes, not each in isolation.
Destructive Tests¶
Need to test an uninstall script? A database migration rollback? A filesystem cleanup routine? Run it in a shadow. The container is disposable -- destroy it when done and your host is untouched.
The Testing Ladder¶
Shadow environments sit in the middle of a testing progression. Each level catches different kinds of bugs at different costs:
Level 1: Unit Tests
├─ Scope: Single function or class
├─ Speed: Seconds
└─ Catches: Logic errors, regressions in isolated code
Level 2: Local Override
├─ Scope: Your package, editable install
├─ Speed: Seconds
└─ Catches: Integration within your own repo
Level 3: Shadow Environment ← you are here
├─ Scope: Full dependency chain, multi-repo
├─ Speed: Minutes
└─ Catches: Cross-repo breakage, dependency conflicts, install failures
Level 4: Push & CI
├─ Scope: Full matrix (OS × Python version × config)
├─ Speed: 10-30 minutes
└─ Catches: Platform-specific issues, environment matrix failures
Level 5: Docker E2E
├─ Scope: Production-like environment
├─ Speed: 15-45 minutes
└─ Catches: Deployment issues, config drift, system integration
When to Use Each Level¶
| Situation | Best Level | Why |
|---|---|---|
| Fixed a typo in a docstring | Unit tests (1) | No behavior change to verify |
| Changed a function signature | Unit tests + local override (1-2) | Need to check callers within the repo |
| Changed a public API used by other repos | Shadow (3) | Need to check cross-repo consumers |
| Adding a new dependency | Shadow (3) | Need to verify it installs cleanly in context |
| Multi-repo coordinated change | Shadow (3) | Need to test the combination |
| Ready to merge | Push & CI (4) | Need the full platform matrix |
| Preparing a release | Docker E2E (5) | Need production-like validation |
The key question for shadows: "Does my change break things outside my repo?" If the blast radius is contained to your repository, unit tests and local overrides are sufficient. If it crosses repo boundaries, reach for a shadow.
What You've Learned¶
- Shadow environments are OS-isolated containers that redirect specific git URLs to your local working tree snapshots
- They use bubblewrap (Linux) or sandbox-exec (macOS) for security isolation and an embedded Gitea server for git URL rewriting
- The
amplifier-shadowCLI manages the full lifecycle: create, exec, shell, diff, inject, extract, destroy - Shadows excel at cross-repo integration testing -- verifying that your local changes work with the full dependency chain before pushing
- They sit at Level 3 of the testing ladder, between local overrides and CI -- catching cross-repo breakage at the cost of minutes, not the 15+ minutes of a full CI pipeline