Browser Testing Bundle

Overview

The Browser Testing bundle provides browser automation through agent-browser, a token-efficient CLI that drives real browsers using accessibility-tree refs instead of DOM selectors. This means your agents interact with web pages the way assistive technology does -- through semantic element references rather than fragile CSS paths.

The bundle includes three specialized agents, each optimized for different browser interaction patterns. Rather than a single do-everything browser tool, you get purpose-built experts for automation, research, and visual documentation.

Key Characteristics:

  • Token-efficient: Uses compact accessibility-tree snapshots instead of full DOM dumps
  • Ref-based interaction: Elements get stable refs like @e1, @e2 for reliable targeting
  • Real browser rendering: Handles JavaScript, SPAs, React apps, and dynamic content
  • Three specialists: Each agent brings focused expertise to different browser tasks

Prerequisites:

# Requires Node.js 18+
npm install -g agent-browser
agent-browser install

The install command downloads browser binaries. Run it once after the npm install.

Agents Included

Agent Purpose Primary Use Cases
browser-operator General automation Navigation, forms, data extraction, screenshots, UX testing
browser-researcher Research and synthesis Multi-page exploration, documentation lookup, data gathering
visual-documenter Visual documentation Screenshots, QA evidence, responsive testing, change tracking

browser-operator

The Browser Operator handles general-purpose browser automation. It translates natural language instructions into browser actions -- navigating pages, filling forms, clicking buttons, extracting data, and capturing screenshots.

Responsibilities: - Navigating to URLs and following links - Filling forms and submitting data - Clicking buttons and interacting with UI elements - Extracting text and structured data from pages - Taking screenshots for verification - Testing UX flows end-to-end

Best for: Testing login flows, filling contact forms, verifying page content, interacting with web applications.

You: "Go to our staging site and test the login flow with test credentials"

browser-operator: Opens the URL, snapshots the login form, fills
username and password fields by ref, clicks submit, verifies the
dashboard loads, screenshots the result.

browser-researcher

The Browser Researcher is optimized for multi-page exploration and data synthesis. When you need to gather information across several websites, compare content, or look up documentation on JavaScript-rendered sites, this agent handles the navigation and summarization.

Responsibilities: - Exploring multiple pages and synthesizing findings - Extracting and comparing data across websites - Looking up documentation from modern JS-rendered sites - Gathering structured data for analysis

Best for: Competitive research, documentation lookup from SPAs, multi-source data gathering, pricing comparisons.

You: "Research the pricing tiers of the top 3 CRM platforms"

browser-researcher: Visits each vendor's pricing page, handles JS
rendering, extracts tier names/prices/features, synthesizes a
comparison table across all three.

visual-documenter

The Visual Documenter creates visual records of websites, UI states, and workflows. It captures screenshots at multiple viewport sizes, documents step-by-step flows, and produces before/after comparisons for change tracking.

Responsibilities: - Capturing screenshots at specific viewport sizes - Documenting multi-step workflows visually - Creating responsive design evidence across breakpoints - Building before/after comparison sets - Producing QA evidence for review

Best for: Responsive testing, workflow documentation, visual regression evidence, design review artifacts.

You: "Document the checkout flow step by step with screenshots"

visual-documenter: Walks through each checkout step, captures a
named screenshot at each stage (cart, shipping, payment, confirmation),
producing an organized visual record of the complete flow.

Core Concepts

The Ref System

After taking a snapshot, agent-browser assigns refs to interactive elements. These refs are short identifiers like @e1, @e2 that you use in subsequent commands:

# Take a snapshot to discover elements
agent-browser snapshot -ic

# Output includes refs:
# @e1 [input] Search...
# @e2 [button] Submit
# @e3 [link] Sign In

# Use refs to interact
agent-browser click @e3
agent-browser fill @e1 "amplifier tutorial"
agent-browser click @e2

Refs are only valid until the page changes. After any navigation, click, or form submission that alters the page, take a new snapshot to get fresh refs.

The Snapshot-Act-Snapshot Loop

The fundamental pattern for browser automation:

  1. Snapshot -- capture current page state and get element refs
  2. Act -- click, fill, or navigate using those refs
  3. Snapshot again -- verify the result and get new refs
  4. Repeat until the task is complete

This loop keeps the agent grounded in actual page state rather than guessing at element positions.

Key Commands

Command Purpose Example
open <url> Navigate to a URL agent-browser open https://example.com
snapshot -ic Get interactive elements (compact) Shows refs for clickable/fillable elements
click @ref Click an element agent-browser click @e1
fill @ref "text" Type into an input agent-browser fill @e2 "search query"
screenshot <file> Capture the page agent-browser screenshot page.png
close Close the browser Ends the session cleanly

The -ic flags on snapshot mean interactive (only show elements you can interact with) and compact (minimal output to save tokens).

When to Use

Browser Agents Are the Right Choice

  • JavaScript-rendered content: SPAs, React apps, Vue dashboards, anything that requires JS to render
  • Form interactions: Login flows, multi-step forms, checkout processes
  • Click-based navigation: Menus, dropdowns, tabs, modals
  • Screenshots and visual verification: Capturing page state for documentation or QA
  • Dynamic content: Pages that load data asynchronously or change based on interaction

web_fetch Is the Right Choice

  • Static HTML pages: Content that renders server-side without JavaScript
  • API endpoints: JSON responses, REST APIs, GraphQL queries
  • Quick content grabs: Fetching a single page's text content
  • Large downloads: Saving files, fetching raw data
  • Speed-sensitive tasks: web_fetch is faster when JS rendering isn't needed

Decision Guide

Does the page require JavaScript to render?
  Yes --> Browser agent
  No  --> Does it require clicking or form filling?
    Yes --> Browser agent
    No  --> Do you need a screenshot?
      Yes --> Browser agent
      No  --> web_fetch

Common Patterns

Login and Authenticated Navigation

Many tasks require authenticating first, then performing actions behind login:

1. Open the login page
2. Snapshot to find username/password fields and submit button
3. Fill credentials and submit
4. Snapshot the authenticated page to verify login succeeded
5. Navigate to the target page and continue work

Data Extraction from Tables

For pages with tabular data (pricing pages, dashboards, comparison tables):

1. Open the page and wait for JS to render
2. Snapshot to confirm table content is loaded
3. Extract text content from the table rows
4. Structure the data for comparison or analysis

Multi-Site Research

When comparing information across several websites:

1. Open first site, extract target data, close
2. Open second site, extract equivalent data, close
3. Repeat for remaining sites
4. Synthesize findings into a structured comparison

The browser-researcher agent handles this pattern natively -- describe what you need and it manages the navigation flow.

Visual Regression Workflow

For tracking UI changes over time:

1. Capture baseline screenshots at defined viewports
2. Make changes to the application
3. Capture new screenshots at the same viewports
4. Compare side by side to identify visual differences

Try It Yourself

Go to https://news.ycombinator.com and extract the titles
of the top 5 stories.

The browser-researcher agent will open the page, snapshot the content, and extract the story titles.

Fill a Form

Go to https://httpbin.org/forms/post and fill out the form
with test data, then submit it.

The browser-operator agent will snapshot the form, identify input fields by their refs, fill each one, and submit.

Capture Responsive Screenshots

Take screenshots of https://example.com at three viewport widths:
mobile (375px), tablet (768px), and desktop (1440px).

The visual-documenter agent will resize the viewport and capture each breakpoint.

Test a Multi-Step Flow

Go to a demo todo app. Add three items: "Buy groceries",
"Write tests", and "Deploy app". Then mark the second one
as complete. Screenshot the final state.

This exercises the full snapshot-act-snapshot loop across multiple interactions.

Best Practices

  1. Always re-snapshot after page changes: Refs become stale when the page updates. Take a fresh snapshot after every action that modifies the DOM.

  2. Use compact snapshots: The -ic flags reduce token usage significantly. Only request full snapshots when you need non-interactive elements.

  3. Close browsers when done: Browser sessions consume resources. Always close when the task is complete.

  4. Prefer researcher for multi-page tasks: If you need to visit more than two or three pages, the browser-researcher agent handles navigation flow and synthesis better than the operator.

  5. Use visual-documenter for evidence: When you need screenshots for documentation or QA, the visual-documenter produces more organized, systematic captures.

  6. Handle dynamic loading: Some pages load content asynchronously. If a snapshot shows fewer elements than expected, wait briefly and snapshot again.

  7. Keep interactions sequential: Browser state is inherently sequential. Don't try to parallelize interactions on the same page.

Troubleshooting

agent-browser not found: Run npm install -g agent-browser and verify with agent-browser --version. Requires Node.js 18+.

Browser install fails: Run agent-browser install to download browser binaries. On Linux, you may need additional system libraries -- check the error output for missing dependencies.

Stale refs after clicking: The page changed but the agent used old refs. This happens when a snapshot isn't taken after navigation. The fix is always: snapshot, then act.

Page content not loading: Some pages need time for JavaScript to execute. If the snapshot looks incomplete, the agent should wait and retry the snapshot.

Element not found by ref: The ref may have been invalidated by a page update. Take a new snapshot to get current refs.

Screenshots are blank or partial: Ensure the page has fully loaded before capturing. For long pages, the screenshot captures the visible viewport by default.


Next Steps: - Try web tools for quick static content fetching - Explore Foundation agents for development workflows - Learn about Custom Bundles to extend browser capabilities