This 100% Local AI Automation Pipeline Blows My Mind
I spent the weekend building a 100% local AI automation pipeline that produces Fireship-style explainer videos end to end. No API calls. No cloud LLMs. No paid image generation. Just my DGX Spark, four open-source models, and OpenCode driving the whole thing. The result is a workflow that can write, illustrate, narrate and render a 3+ minute video on any topic — and the first run produced an "AI coding agents are slot machines" video I'm genuinely happy with.
Watch the video:
Why 100% local?
The Fireship format is the kind of thing an AI pipeline should handle well — fast cuts, image cards, meme energy, and a clear thesis up front. The question I wanted to answer this weekend was: how close can I get to that style using only models I run on my own hardware, with zero API calls?
The motivation isn't cost. APIs are cheap right now. The motivation is autonomy. If the whole pipeline lives on my machine, I can let it run overnight without worrying about rate limits, key rotation, or a vendor pulling the rug. It's the same reasoning that pushed me into headless AI agents earlier this year — once the loop is fully local, you can scale it as far as your hardware lets you.
The four-pillar stack
The pipeline rests on four models, each chosen after some testing:
- Qwen 3.6 27B — the LLM doing script writing and orchestration. I tried Gemma 4 27B first but the tool calling fell apart in loops. Qwen 3.6 27B was rock solid, didn't waste tokens on excessive "thinking," and the speed was much better.
- SSD-1B image turbo (Said image turbo) — image card generation, downloaded from Hugging Face. Runs locally, fast enough that the image step doesn't bottleneck the whole render.
- Kokoro TTS (hexgrad) — 82M-parameter text-to-speech model. Tiny, but the voice quality is good, and on the DGX Spark it's basically free in real-time terms.
- Hyperframes by HeyGen — HTML-rendered video built for agents. Same idea as Remotion but designed from the start to be driven by an LLM rather than a human editor.
And gluing it all together: OpenCode as the agent runtime. Same general pattern as my 3-part AI agent system — skill plus headless model plus tools — just with a local LLM instead of an API-hosted one.
The script-writing trick
Getting the Fireship style right was the part I was most worried about. The fix turned out to be embarrassingly simple: I grabbed transcripts from a handful of Fireship videos, analyzed the structure and humor, and compiled the patterns into a single markdown file. That file gets passed to Qwen at the start of every run as a style reference.
This is the same idea as a system prompt or a few-shot example. The model isn't trying to "be" Fireship — it's trying to match the rhythm: short punchy intros, a clear thesis, jokes that land in the first 10 seconds, image cards every few sentences. That's what makes the format watchable, and it transfers to a model that has never seen the channel.
The first real run
I needed a topic. I picked something that's been rattling around my head for a while: Claude Code, Codex, and the rest of the AI coding agents are basically slot machines. You pull the lever, you don't know what you get, sometimes the jackpot lands and sometimes the model produces something completely broken. It's true, it's funny, and it's exactly the kind of thesis Fireship would build a video around.
The prompt I gave Qwen was about as terse as you'd expect:
- Compare AI coding agents like Claude Code and Codex to slot machines
- Reference casinos, work in clever jokes
- Do research, use Surfagent for any web browsing
- Aim for 3.5+ minutes, ~60% image cards
- Good luck
Then I went to the gym. By the time I got back, the context window had grown to 174,000 tokens and the final V1 was rendered. The pipeline had done script writing, image generation, TTS, and the full Hyperframes render — all without me touching it.
What the output actually looks like
The opening line that Qwen produced: "Last week some guy on Reddit accidentally explained the entire AI coding industry in 600 words and one analogy. Claude Code is a slot machine. That's it. That's the post."
That's a Fireship hook. Crisp, opinionated, and it primes the rest of the video without any windup. The pipeline pulled context from r/betteroffline, threaded in the May 11 2026 date, and even worked in a joke about Anthropic being "the safety-pilled, constitutional AI people" who shipped a slot machine with a help section. The image cards on Said image turbo rendered cleanly throughout.
It is not perfect. Some image cards are a little off-theme, and the pacing in the middle drags a few seconds longer than Fireship would tolerate. But for a first run, with zero hand-editing, on a 100% local stack, it is genuinely shocking that this is now free.
Why this matters
The reason this stack feels important is not the specific video. It is the proof that a 4-pillar local pipeline can now produce publish-grade content on commodity hardware. Six months ago every step here would have required cloud APIs and a credit card. Today it runs on one DGX Spark, in the background, while I'm at the gym.
Same trajectory as the short-form clip pipeline I've been refining: the components keep getting smaller, faster, and good enough that you stop noticing the seams. The interesting question is no longer "can a model do this" — it is "how many of these loops can I run in parallel on the same machine?"
I'm going to keep iterating on this workflow. The obvious next step is a Claude Code or Codex version so people without local hardware can run the same thing — the image API is cheap, most laptops can handle Kokoro, and Hyperframes is API-driven anyway. Watch the channel for that.
Resources
- Qwen models on Hugging Face — Qwen 3.6 27B is the orchestrator in this pipeline.
- SSD-1B image turbo — local image generation, available on Hugging Face.
- Kokoro TTS — 82M-parameter text-to-speech, small and fast.
- Hyperframes by HeyGen — HTML-rendered video built for agents.
- OpenCode — the agent runtime driving the whole pipeline.
- AI_automata Discord — my new server for AI automation discussion.
FAQ
What hardware do you need to run a 100% local AI video pipeline?
An Nvidia DGX Spark handles the full stack (Qwen 3.6 27B + SSD-1B image + Kokoro TTS + Hyperframes render) comfortably. Any modern workstation with 24GB+ of VRAM can run the same pipeline; smaller VRAM means swapping to a quantized 14B model or offloading image generation to a cheap API.
Why Qwen 3.6 27B instead of Gemma 4 27B for agent workflows?
Gemma 4 27B's tool-calling failed in loops during testing, wasting tokens on redundant reasoning. Qwen 3.6 27B handled tool calls cleanly, didn't burn budget on excessive thinking, and was noticeably faster in practice for the OpenCode orchestration loop.
What is Hyperframes and how does it compare to Remotion?
Hyperframes is HeyGen's HTML-rendered video framework, designed from the start to be driven by an LLM. Remotion is a similar code-based video editor but targets human developers. Hyperframes is a better fit when an agent is writing the render code directly.
Can this pipeline run without local hardware?
Yes — the same four-pillar architecture works with API-hosted models: any image API for SSD-1B, any TTS service for Kokoro, Hyperframes is API-driven anyway, and Claude Code or Codex can replace OpenCode + Qwen. You lose true offline capability but the cost stays low.
How long does a 3-minute Fireship-style video take to generate locally?
The full run — script writing, image generation, TTS, and final render — takes roughly the length of a gym session on a DGX Spark. Image generation is the slowest single step; the LLM script-writing is fast once the style reference is preloaded.
Why use OpenCode instead of Claude Code for local model orchestration?
OpenCode supports any OpenAI-compatible endpoint, including a locally hosted Qwen server. Claude Code only runs Claude. If the goal is a fully local pipeline with no API keys, OpenCode is the natural choice — and the skill/tool patterns transfer cleanly.