agent-browser_skill

This skill automates browser tasks via the agent-browser CLI, enabling AI agents to interact with web pages reliably using refs, semantic locators, and

Python

76

GitHub Stars

2

Bundled Files

3 weeks ago

Catalog Refreshed

2 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstart where the catalogue uses aiagentskills.

npx veilstart add skill partme-ai/full-stack-skills --skill agent-browser

LICENSE.txt630 B
SKILL.md7.4 KB

Overview

This skill provides a comprehensive guide and examples for using agent-browser, a CLI tool for browser automation tailored to AI agents. It covers installation, core commands, selector strategies, agent mode, session management, advanced options, and practical best practices. Use it to automate interactions, capture snapshots, and integrate browser flows into agent pipelines.

How this skill works

The skill maps directly to agent-browser documentation and example files so you can load targeted examples or API references for each task. It explains core CLI commands (open, click, fill, snapshot, eval, etc.), selector types (refs, CSS, XPath, semantic locators), and advanced features like agent mode with JSON output, CDP integration, and viewport streaming. Examples and templates are organized to be adapted into AI agent workflows or standalone automation scripts.

When to use it

Automating web interactions from the command line for scraping, testing, or workflows.
Integrating browser control into AI agents that need deterministic page interactions.
Capturing snapshots and accessibility trees before manipulating page state.
Managing multi-step sessions or authenticated flows without full browser login.
Debugging flows with headed mode or streaming a live viewport for review.

Best practices

Prefer refs (@e1, @e2) for deterministic element targeting; snapshot first to generate refs.
Use --json (agent mode) for machine-readable outputs when integrating with agents.
Maintain state across steps with --session for reliable multi-command workflows.
Use headed mode (--headed) while debugging and headless for production runs.
Combine snapshot options (-i, -c, -d, -s) to optimize selection and output.
Use --headers and authenticated sessions to avoid brittle login automation.

Example use cases

Run a step-by-step AI agent that navigates a site, extracts structured data, and stores results using refs and snapshots.
Create a CLI-driven test harness that opens pages, clicks elements, checks states, and reports pass/fail via exit codes.
Automate form submission flows with session persistence across commands and authorization headers.
Stream the browser viewport to a preview port during development for live UX inspection.
Connect to an existing browser via CDP for advanced debugging or to reuse an authenticated profile.

FAQ

Take a snapshot first to generate refs and prefer using refs (@e1) over CSS/XPath; use semantic locators when refs are unavailable.

When should I use agent mode (--json)?

Use agent mode when an AI agent or automation pipeline needs structured JSON output to parse results and make decisions programmatically.