Nemoclaw, $250K Token Budgets and Open Source AI: Nvidia GTC 2026

2026-03-25

I just got back from Nvidia GTC 2026 in San Jose, and I want to share my takeaways while it is fresh. There are a few big themes worth unpacking: NeMoClaw and the OpenClaw ecosystem, Jensen Huang's $250K-per-engineer token budget claim, the open source panel with Cursor, Perplexity and LangChain, and the L2 self-driving demo. Also, my DGX Spark giveaway is still open until the end of the week.

Watch the video:

Jensen Spent 15 Minutes on OpenClaw

The keynote surprise for me was how much Jensen Huang focused on OpenClaw. He spent at least 15 minutes on it — calling it one of the fastest-growing open source projects on GitHub by star count. There was a "Build a Claw" event at GTC where Peter Steinberger was holding court, surrounded by a wall of people asking questions. I tried to get a word in but the line was deep.

Nvidia's own contribution to the ecosystem is NeMoClaw — their packaged way to run OpenClaw with a Nemotron model in a sandboxed shell. Setup is genuinely simple:

curl -sSL https://nvidia.github.io/nemoclaw/install.sh | bash
nemoclaw launch

It auto-detects your hardware (in my case, M3 Pro), asks which inference provider you want (Nvidia, OpenAI, Anthropic, or local Ollama), sets up the sandbox, configures policies, and launches. I picked Qwen 3.5 4B running locally via Ollama and was chatting in under a minute.

The $250K Token Budget Quote

Jensen said something on the All-In podcast (and reportedly on stage too) that got passed around all week. The exact framing:

"Let's say you have a software engineer or AI researcher and you pay them $500,000 a year. We do that all the time. That $500,000 engineer at the end of the year, I'm going to ask 'how much did you spend in tokens?' And that person said $5,000? I will go ape. If that $500,000 engineer did not consume at least $250,000 worth of tokens, I am going to be deeply alarmed."

The implication: token spend is becoming a leading indicator of engineer productivity. If your salary is $100K and you are spending $200K in tokens, that is now considered a healthy ratio. This is a major shift from "AI tools are an expense to minimize" to "tokens are leverage and you should be burning them aggressively." Combined with the loops I run for Claude Code passive income and the iOS app pipeline, that math starts to make sense — every token spent autonomously is potentially returning multiples.

The Open Source Panel

The session I enjoyed most was the open source panel that Jensen moderated. On stage: the CEOs of Cursor and Perplexity, Mira Murati, the LangChain CEO, and a couple of others. The big themes:

Hybrid is winning. Most of these companies are routing between proprietary models and open source ones based on the task. Cheap open models for the bulk, premium models for the hard stuff. Cursor's Kimera 2 Composer release lines up with this.
Nvidia is committed to open source on both sides. They will keep developing Nemotron and supporting it in their hardware/inference stack.
Open source absorbs cost. When a fraction of every workload is shifted to open weights, the unit economics of agent loops change a lot. This is why Nemoclaw matters — it makes "run OpenClaw locally on a Nemotron model" trivial.

The Hardware Side: Vera Rubin and Token Factories

The hardware narrative was as expected — Vera Rubin GPUs incoming, big enterprise focus, lots of "token factory" framing. The "token factory" idea is that data centers are increasingly evaluated by tokens-per-second per dollar, the same way we used to evaluate them by FLOPs. Nvidia wants to be the king of inference economics, and the new Grok inquiry features apparently fold into that.

Most of this isn't directly relevant to the channel — I am not buying a rack of GPUs. But it sets the backdrop for why local models are getting good fast: there is a massive arms race on inference cost.

L2 Self-Driving Demo: Alpha Meow

On the last day I went down to San Francisco and tried Nvidia's L2 self-driving demo car running their Alpha Meow model. We drove through downtown SF traffic — pretty hairy at peak hours — and the system handled it well. I peppered the engineer with questions: how much training was simulation vs real-world data, how they validate the model, where L4 fits in the roadmap. There is apparently an L4 system in the works (no firm release date), which would mean no driver attention required.

Worth experiencing in person — the L2 in dense traffic is more impressive than any video can capture.

Final Take

The big themes from GTC 2026 were:

OpenClaw/NeMoClaw and agentic AI — the LLM-as-OS framing is now mainstream
Token budgets as leverage — burn tokens aggressively, $250K/engineer is the new normal at top firms
Open source as a strategic layer — hybrid routing wins, and Nvidia is investing heavily here
Inference economics — the "token factory" race is real and it makes everything cheaper

For people building automations like the loops I share on this channel, GTC 2026 is good news on every axis. Tokens get cheaper, open source closes the gap, and "spend your token budget" is now the explicit recommendation from the company supplying most of the world's compute.

DGX Spark Giveaway

Don't forget — my DGX Spark giveaway is still open until the end of the week. Three steps: register for GTC virtually, attend at least one session, fill out the form. Drawing happens this weekend. Links below.

All About AI