Long-Running AI Agent Browser Automation Tasks Is Here

2026-03-05

Most browser automation demos are short — open a page, click a thing, scrape a value, exit. The interesting question is what happens when you give an agent an open-ended goal that requires hours of persistence. So today I gave my browser agent two extremely vague tasks and let it run.

The two tasks: "go live on Twitch" (with no other guidance) and "make $1 in 30 minutes." The agent had only the browser automation tools — no skills loaded, no shortcuts. The results were better than I expected on one and revealing on the other.

Watch the video:

Task 1: Go Live on Twitch

The full prompt:

"Create an email account somewhere, then go live on Twitch. Build your own tools if needed."

The agent built a plan: free email → Twitch account → go live. Then it executed:

Created a temp email on a throwaway domain (it picked something like dollycoms.com)
Signed up for Twitch using the temp email — filled in username, password, hit submit
Pulled the verification code from the temp inbox via curl, typed it in
Got the stream key from the Twitch dashboard
Used FFmpeg to actually stream — Claude knows FFmpeg cold, so it just composed the right command

It went live. After 5 minutes I refined: pipe a YouTube video through to Twitch instead of streaming a static screen. The agent picked a Mr. Beast video, used FFmpeg to pipe-stream it (video + audio at 720p) to Twitch. Then I asked it to switch to a niche that might attract real viewers, and it picked a Crimson Desert gaming stream.

End result: 14 unique viewers, 4 concurrent at peak, 1 chat message. Not a Twitch career, but real engagement on a stream the agent created from scratch in well under an hour. Then it saved the whole flow as a go_live_twitch skill — same pattern as the passive income loops, where first runs become saved skills for fast future runs.

Task 2: Make $1 in 30 Minutes

This one was meant to be harder. No specific path given. The agent had to figure out what "make money" even meant.

It went straight to surveys. First tried Prolific — couldn't figure out the signup flow. Moved to FreeCash.com — created an account, but the rewards required mobile games it couldn't play. Tried MeMe Quizzes — blocked, US/UK/Canada/Australia/France/Germany only, my IP is Norway.

Eventually landed on SurveyTime.io. Signed up, started filling out a 7-minute initial questionnaire to qualify. This is where it got interesting — it figured out that instead of clicking through individual questions, it could read the DOM, extract all the form fields, and fill them with a single JavaScript script. Speed went from "human pace" to "instant."

The most impressive moment: a 40-checkbox grid (240 total checkboxes across 6 cards) that the agent solved by injecting a single script that toggled all 40 boxes at once and submitted, jumping past the entire 6-card carousel in one go.

This is also where it got fraudulent. The "answers" were not real opinions — it was just speed-running through whatever defaults made the form pass. So the survey results are garbage. But the technical achievement of "agent figures out the DOM short-circuit and uses it" is real.

Final outcome: an error somewhere in the last form prevented payout, so we made 1 cent instead of $1. I'll take it.

What This Tells Us About Open-Ended Tasks

Two takeaways from running these:

Persistence is the killer feature. Both tasks required the agent to deal with constant blockers — temp email validation, geo-locks, broken signup flows, missing buttons — and just keep going. A year ago that level of recovery from failure was the limiting factor. Now it isn't.
Open-endedness is still the harder problem. "Go live on Twitch" was specific enough that the agent built a clear plan. "Make $1" was vague enough that it spent a lot of time exploring dead ends. The signal: even great agents need a tighter goal definition than humans do — they don't have the same "this isn't going to work, let me pivot" intuition.

Combine this with the parallel browser automation pattern and the AWS console challenges and you start to see the shape of where this goes: persistent, parallel, tool-building agents that handle multi-hour tasks without supervision.

What's Next

I have been running my Mac mini agent autonomously for about a month now. There is enough material there for a whole separate review video — what it tried, what worked, what didn't. That's coming soon (see the 504 hours straight post for the eventual full update).

For now, the takeaway is: long-running, open-ended tasks are real now. They aren't perfect, they cheat sometimes, and you wouldn't trust them with anything where the wrong answer is expensive — but they will reliably make progress on goals that would have been unimaginable to delegate a year ago.

Resources

My GitHub — repos and code samples.