Solving Browser Automation for AI Agents: Surfagent
I built something over a weekend that has quietly become the most important tool in my agent stack. It is called Surfagent — a browser automation API for AI agents that runs through Chrome via CDP (Chrome DevTools Protocol). It is open source, you install it with a single npm command, and it lets your agents drive any logged-in website without needing API keys or dealing with CAPTCHAs.
This post is the introduction. If you have seen Surfagent show up in my other posts (and it shows up in most of them now), this is where the story starts.
Watch the video:
The Core Idea
Most agents that need to take action on the web hit a wall: APIs are gated, scraping breaks, and login walls block headless browsers. Surfagent's solution is to ride along inside your real Chrome session. Because you are already logged in, the agent inherits all of that authentication for free. No API keys, no token rotation, no CAPTCHAs.
Install:
npm install -g surfagent
surfagent start
From inside Claude Code (or any other agent framework), you can now issue plain-language commands like "navigate to my Discord server, read what's happening in general" and the agent does it via your live Chrome session.
Demo 1: Discord Recon
First test in the video: I told Claude Code to navigate to my Discord and summarize the last 200 messages in the general channel. Surfagent navigated the channels list, opened general, scrolled the message history, and pulled context. Output: bullet summary of recent topics — Discord worms, scam discussions, AI music generation. No Discord API, no bot tokens, no rate limits. Just the agent reading my browser.
Demo 2: Hacker News Click-Through
I asked it to go to Hacker News and click into the 10th post. Done in two seconds. The post was "Distributed DuckDB" and the agent could read the article body. The interesting part is the speed — the recon command maps every interactive element on the page first, so subsequent clicks are basically zero-latency.
Demo 3: Google Sheets API-Free Research
This is where it gets fun. I asked Surfagent to find the API prices for Opus 4.6, Sonnet 4.6, GPT-4.4, and Gemini 4.1, then put them into a Google Sheet. The agent:
- Browsed Anthropic, OpenAI, and Google for current pricing
- Switched back to Google Sheets
- Filled in headers (Model, Input, Output) and four rows of data
- When asked, generated a chart from the data using the Insert menu
The agent figured out the Sheets UI on the fly — clicking cells, typing values, navigating menus. No Google Sheets API key. No service account. Just a logged-in browser tab.
Demo 4: X (Twitter) Posting
For the X demo I had it search for "Claude Mythos" news, pull together latest posts, and then compose and publish a creative post about the topic. The recon-first approach handled the X UI cleanly: search field, switch to "Latest" tab, scroll, then to the post composer, draft text, click Post. Whole flow took about a minute.
Why Recon-First Matters
The biggest unlock is the recon primitive. Before clicking anything, Surfagent maps every element on the page — buttons, links, inputs, scrollable regions — and gives the agent a structured representation. This is what makes it fast, because the LLM doesn't have to "look" at a screenshot and guess. It gets a clean element list and decides what to act on.
This is also what unlocks the parallel browser automation pattern — multiple Surfagent instances running across multiple Chrome profiles simultaneously. And it is what powers the App Store Connect submission flow in my iOS app automation pipeline.
Important Caveat: Not Headless
One thing I want to flag clearly — Surfagent is not headless. It needs a real Chrome window. I run mine on a dedicated Mac mini that exists just to be the "agent's browser." If you are trying to run this on a server with no display, you will need a different approach.
For my use case, the dedicated machine is the right answer. It can be logged into all my accounts permanently, and any agent loop on my main machine just talks to it over the network.
What's Next
I have only scratched what is possible with this. Forms, multi-tab workflows, drag and drop, file uploads — all of it works. Open source means you can extend it for your own use cases too. Issues and pull requests welcome — I have done QA testing, but I cannot cover every edge case alone.
Going forward, most of my passive income setups (the Claude Code passive income loops, the iOS apps, the recon agents) all sit on top of Surfagent. If you want a single primitive that turns "AI agents on the web" into a solved problem for your own workflows, this is it.