Tutorials 26

Lesson 26 — OpenClaw Agent Browser Skill: Advanced Headless Browser Automation for Dynamic Pages and Login State (2026)

Goal: Install the Agent Browser Skill to precisely control web page elements using Accessibility Tree snapshots and ref references — handling pages that require JavaScript rendering or login sessions.


Agent Browser vs. Built-in Browser Tools

Dimension Lesson 04 Built-in Browser Agent Browser (this lesson)
Page type Static HTML pages Dynamic rendering (React/Vue/Angular)
Login state Not supported Supported — can save sessions/cookies
Interaction method Screenshot + coordinate clicks Accessibility Tree + ref references
Precision Medium — coordinate drift risk High — directly references DOM elements
Wait strategy Fixed delays Intelligent waiting (network idle/element appears)
Best for Quick static content extraction Form submission, post-login operations, SPA apps

If you've hit failures scraping dynamically rendered JavaScript pages, or need to simulate a login, it's time to upgrade from Lesson 04 to Agent Browser.


Core Concept: What Is the Accessibility Tree?

Screenshots are problematic: AI has to identify the coordinates of each element in the screenshot, and any scroll or zoom causes coordinate drift and click failures.

The Accessibility Tree is the structured page description that browsers maintain for assistive tools like screen readers. It gives each element:

  • A role (button, input, link, heading...)
  • Text content
  • A unique ref identifier
  • State (visible, disabled, checked...)

Example fragment:

[ref=e23] button "Log In" (enabled)
[ref=e24] input type="text" placeholder="Email" value=""
[ref=e25] input type="password" placeholder="Password" value=""
[ref=e26] checkbox "Remember me" (unchecked)

With a ref, a click becomes click ref=e23 — no matter how the page is scaled, it never fails. That's why Agent Browser is more precise than screenshot-based approaches.


Step 1: Install Dependencies (Playwright)

Agent Browser Skill depends on Playwright headless browser:

npm install -g playwright
npx playwright install chromium

Verify installation:

npx playwright --version
# Should output a version number like 1.41.0

If you prefer Puppeteer, you can use that too:

npm install -g puppeteer

The Skill automatically detects which browser driver is installed.


Step 2: Install the Skill

/install @matrixy/agent-browser-clawdbot

Verify:

pnpm openclaw skills list
# agent-browser-clawdbot should appear in the list

Step 3: Basic Usage — Navigate and Get Page Structure

The basic flow for scraping dynamically rendered pages:

Use Agent Browser to open https://example.com and give me a structural snapshot of the page

Underlying execution:

/browser navigate https://example.com
/browser snapshot

Sample return (Accessibility Tree fragment):

[ref=e1]  heading "Example Domain" level=1
[ref=e2]  paragraph "This domain is for use in illustrative examples…"
[ref=e3]  link "More information…" href="https://www.iana.org/domains/example"

Unlike Lesson 04, this isn't a screenshot — it's structured text, and AI can precisely locate any element.


Step 4: Precise Clicking (Using ref References)

After getting a snapshot, use ref references to interact with elements instead of coordinates:

Click the "More information" link on the page

OpenClaw finds the corresponding [ref=e3] and executes:

/browser click ref=e3

Filling out a form (combined operation):

Type "OpenClaw tutorial" in the search box, then click the search button

Equivalent execution:

/browser type ref=e12 text="OpenClaw tutorial"
/browser click ref=e15

This is the core of Accessibility Tree-based operation — all actions use refs and don't depend on coordinates.


Step 5: Handle Login (Save Session)

Standard flow for simulating login with a headless browser:

Log in to https://app.example.com with Agent Browser,
username: user@example.com, password: mypassword,
save the session after logging in

OpenClaw execution:

/browser navigate https://app.example.com/login
/browser snapshot
# Finds email input [ref=e24] and password box [ref=e25]
/browser type ref=e24 text="user@example.com"
/browser type ref=e25 text="mypassword"
/browser click ref=e23  # Login button
/browser save-session --name "example-app"

After saving the session, reuse it next time for the same website:

/browser load-session --name "example-app"
/browser navigate https://app.example.com/dashboard

Step 6: Wait Strategies for Dynamic Pages

The most common problem when automating JavaScript-rendered sites is "taking a snapshot before content loads." Agent Browser provides multiple wait strategies:

Wait for network requests to complete (good for API-driven SPAs):

/browser navigate https://app.example.com/data-table --wait-for network-idle

Wait for a specific element to appear (good for async-loaded content):

/browser wait-for-element --text "Data loaded successfully"
/browser wait-for-element --ref-role "table"

Fixed delay (fallback approach):

/browser wait 2000

Step 7: Real-World — Automatically Fill and Submit a Form

Complete example of headless browser form submission:

Use Agent Browser to open https://forms.example.com/contact,
fill in the following and submit:
- Name: John Smith
- Email: john@company.com
- Message: I'm interested in your enterprise partnership program. Please have sales contact me.

OpenClaw execution flow:

/browser navigate https://forms.example.com/contact
/browser snapshot
# Identifies form field refs

/browser type ref=e10 text="John Smith"
/browser type ref=e11 text="john@company.com"
/browser type ref=e12 text="I'm interested in your enterprise partnership program. Please have sales contact me."
/browser click ref=e20  # Submit button

/browser wait-for-element --text "Submitted successfully"
/browser snapshot  # Confirm submission result

FAQ

What's the difference between OpenClaw Agent Browser and Lesson 04's browser automation?

Lesson 04's built-in browser tool is based on screenshots and coordinate operations — it's good for quickly scraping static web content, simple to configure, and works out of the box. Agent Browser uses Playwright to drive a headless Chrome, understanding page structure through the Accessibility Tree rather than screenshots. It supports JavaScript-rendered dynamic pages, persistent login sessions, and cookie reuse, with much higher operational precision. If you just need to read static web pages, use Lesson 04. When you need login, form submission, or SPA app interaction, that's when you need Agent Browser.

Can Agent Browser Skill handle websites that require login?

Yes. Agent Browser supports the full login flow: navigate to the login page → fill in credentials → submit → save session. The saved session includes cookies and localStorage data, allowing direct loading for the same website next time without re-logging in. Sessions are encrypted and stored locally in ~/.openclaw/sessions/ — not uploaded to any cloud. Note: some websites use bot detection to identify automated browsers. If you encounter such sites, enable stealth mode in the Skill settings.

What is the Accessibility Tree? Why is it better than screenshots?

The Accessibility Tree is a page structure tree that browsers maintain for assistive tools like screen readers. Every UI element has a unique ref identifier, role type, and text content. It's better than screenshots for three reasons: one, ref references aren't affected by page scrolling or zooming — operation stability is far higher than coordinates; two, text content can be directly understood by AI without OCR; three, it can access elements that are hidden or not yet rendered, making it easy to check loading status. Simply put: screenshots tell AI "what the page looks like," while the Accessibility Tree tells AI "what elements are on the page and what each one is called."

Do I need to install Chrome separately?

No separate system Chrome installation needed. When you run npx playwright install chromium, Playwright downloads a standalone Chromium binary (~150 MB) completely isolated from your system browser — it doesn't affect your everyday Chrome. If your system already has Chrome installed and you want to reuse it, you can specify executablePath to your local Chrome path in the Agent Browser Skill config, but that's usually not necessary.


Next Steps

Stay up to date with OpenClaw

Follow @lanmiaoai on X for tips, updates and new tutorials.

Follow