Autoresearch Claude Code Hacker: Can It Breach My Vibecoded Site?

2026-03-23

Andrej Karpathy posted his autoresearch project recently — a small loop that mutates a hypothesis, evaluates it, keeps the better attempts, and discards the worse ones. He used it to train a nano GPT. I wanted to see if I could repurpose the same pattern as a white-hat security researcher: point Claude Code at my own website, give it a goal (steal the paywalled MD files), and let it iterate.

This post walks through the setup, the run, and what the agent actually found.

Watch the video:

The Setup

The agent has two pieces of context loaded in via CLAUDE.md and skills:

Persona — a "Neo 777" white-hat penetration tester profile. Frames the work as defensive research on a target I own.
Skills — web app reconnaissance, request analysis (using my Surfagent browser tool), attack-surface mapping, and a hack_llm skill for chatbot red-teaming.

Then four files run the autoresearch loop:

program.md — instructions: prepare for one experiment, run, evaluate, learn
attack.sh — the script being mutated each iteration
evaluate.sh — scores the result 0-100
log/ — append-only history so each run can read what has been tried

The loop is the same as Karpathy's pattern, just applied to a different problem: read the log of past attempts, pick a new attack idea, rewrite attack.sh, commit to git, run for up to 5 minutes (I bumped it from 2 to give experiments more room), evaluate, keep the commit if it scored better, reset if it scored worse, write the lesson learned, repeat.

Running the Loop

I let it run for a while and checked in. After 11 iterations the agent had tested headers, path manipulation, API endpoints, Stripe webhooks, cache poisoning, source-map leaks, and a few directory race-condition variations. Best score so far: 30 — meaning some non-standard responses on header / path tricks, but no actual content access.

I let it complete the remaining categories. Final report after 13 runs across 12 categories: "Your site is well defended. Best score 30 was non-standard responses to header/path tricks, but no content access achieved."

I want to be honest about how much that says. The autoresearch agent is good at the categories it knows about, but it is not omniscient. It does not invent novel attack classes. So this is "the well-known stuff doesn't work," not "this site is unhackable."

Cross-Pollination With Codex

This is where it gets interesting. After 13 runs, I copied the findings into Codex (running GPT 5.4 on the $20 plan) and asked it to suggest new experiments to push the score. I have been getting a lot of mileage out of the Claude Code Opus 4.6 + Codex GPT 5.4 combo lately — Claude is good at execution, Codex is good at strategy.

Codex came back with a prioritized list of new experiments — things like RCE payload variants, post-purchase token manipulation, cross-domain acceptance edge cases. I pasted those back into Claude Code with "update the program for more testing" and let the loop continue.

The One Real Finding

After 16 total experiments, the agent did surface one weakness: post-purchase token portability. After someone buys the file, the download link works for 10 minutes and 3 downloads. Within that window, you could share the URL with friends and they could also download. Token is "portable" — not locked to the buyer's session.

I checked, and that is exactly the trade-off I made when designing it. I do not actually care if a buyer shares a link with one or two friends — the 3-download cap and the 10-minute expiry make it self-limiting. So this is a "noted, intentional" finding rather than a vulnerability. Good news either way: the agent surfaced it, I made an informed call.

What This Means If You Have a Vibecoded App

If you ship a vibe-coded app — basically anything you built fast with an LLM and did not formally audit — running this kind of autoresearch loop on it is high-leverage. It costs you a few hours of agent time on the Claude Max plan (so essentially zero), and it surfaces the well-known categories of attacks before the public does.

I tested another vibe-coded site at the request of a friend who runs it, and the agent found real issues there. So this does work outside of just my own setup.

This is also the cybersecurity-meets-AI overlap I covered in my AI cybersecurity post. Tools like this are why the field is exploding — defense teams now have an autonomous red-teamer they can point at every PR.

The Important Caveat

I want to mention this clearly: I ran this against my own properties only, with explicit consent from the owners in the case of the friend's site. You should never run an autoresearch hacker against systems you don't own or aren't authorized to test. Claude itself will refuse to help with offensive operations against unauthorized targets — and that is the right behavior.

The legitimate use case is your own apps. Vibe code, then before you open it to the public, run an autoresearch sweep to catch the obvious stuff.

Resources

My GitHub — repos and code samples