“An agent is an LLM and a harness”
Khalil’s definition is worth taking literally. An agent “involves two things… the loop and the LLM,” he says — and crucially, “you don’t want each loop to do the same thing. You want to leverage the results from the LLM… Each loop should take us closer to our goal.” The intelligence is rented from the model; the behaviour comes from the harness that wraps it.
His potted history makes the point. ChatGPT, he notes, “innovated outside of the model” — a system prompt on top of your prompt, then multimodal, then memory (“ChatGPT knows that I really like to barbecue”), then files. None of that is the model. “This is the harness,” he says of the lineage running through Cursor to Claude. “Everything here is the harness.” The frontier models converged; the products diverged on harness.
The clean split: the modelanswers “what should I do next?” The harnessdecides what tools exist, what context the model sees, what it’s allowed to touch, how results feed the next loop, and when to stop. Swap the model and the agent gets smarter or cheaper. Change the harness and the agent does something different.
The OpenClaw moment — and its maintenance trap
The interview is framed around OpenClaw, the open-source harness that, in Khalil’s words, “was a major change for the industry… It got more stars than Linux in months.” Nvidia is materially involved — “we have a couple of developers at the company that contribute to OpenClaw full time” — and stepped in when the project hit trouble: “We saw Peter [Steinberger] tweeting about some of the issues they had, and we just rolled up our sleeves.”
But the most instructive part is the trouble itself. OpenClaw is over 800,000 lines of codewith a backlog of pull requests it cannot drain. Khalil names the mechanism precisely: “the cardinal rule of code — it is easier to write than it is to read.” In the agent era that asymmetry explodes. “It is easier to enlist many agents to help write code and build these PRs. The bottleneck is in merging the PRs through” — and in dealing with the fallacies they introduce.
The trap, stated plainly. Agents make harness code cheap to write and no cheaper to read, verify and merge. A popular general-purpose harness therefore accretes faster than any team can review it. The result is a large, fast-moving surface that is hard to trust line-by-line — exactly the wrong shape for anything that runs with access to your terminal, your files, and your credentials.
How Nvidia packages a harness: blueprints + skills + a security runtime
Nvidia’s answer is not “one harness to rule them all.” It is to treat harnesses as a category and ship structure around them:
- Blueprints.“NemoClaw is our blueprint,” Khalil says — “there’s a blueprint for Hermes and a blueprint for OpenClaw.” A blueprint “sets up the runtime, enables the policies if there’s a local GPU, and runs the model.” In Nvidia’s usage it means the structure for building an agent, wired to their model (Nemotron) and stack — a reproducible scaffold, not a monolith.
- Skills.“The way to get your product into this rapidly growing market is with skills” — for Nvidia, the CUDA-X libraries as GPU-accelerated capabilities. “Every product we build now… needs to have a skill.” Skills are how a capability plugs into a harness without bloating its core.
- A security runtime.For the enterprises “more worried” about agents, Nvidia points to OpenShell, described as a security runtime — because the moment you “give it access to your terminal,” the harness is the attack surface.
- Specialized sub-agents, not a takeover.“Build a specialized agent or a sub-agent” that fits where teams already are. His analogy: a borrowed microwave makes you press a lot of buttons; your own one is “boop, boop, done.” A focused harness you own is faster than a general one you rent.
Read together, that is a quiet argument against the 800k-line monolith and for small, owned, well-scoped harnesses with explicit skills and an explicit security boundary.
What this means for building our own functions
This is the lesson we take for our own tooling. The temptation with agents is to adopt a big, capable, general harness and let it grow. The OpenClaw moment is the counter-evidence: a harness you cannot read top-to-bottom is a harness you cannot fully trust — and trust is the whole game once it can act. So the design rules we hold to:
- Small and bounded beats big and general. A handful of purpose-built functions, each of which one person can read in a sitting, is worth more than a sprawling framework whose behaviour is emergent. Surface area is a cost, not a feature.
- Skills, not bloat. New capability arrives as a discrete, named skill the harness can call — added at the edge, not stitched into a growing core. The core loop stays simple and auditable.
- The security boundary is part of the harness. Least privilege, an explicit allowlist of what an agent may touch, and hard caps on cost and blast radius are not add-ons — they are the runtime. An agent with terminal access is only as safe as the wall around it.
- Own the read, not just the write. Because agents make code cheap to write and not to review, the scarce resource is reading and verification. Keep the harness small enough that review keeps pace with change.
The model layer is a buy: rent the best one for each task and route between them on measured quality and cost. The harness layer is a build: keep it small, skill-based, and locked down, so it stays something you can actually trust to run.
What we take from it
- “Agent = LLM + harness” is the right mental model. It tells you where to spend: rent the model, build the harness.
- A general harness has a maintenance ceiling.The OpenClaw PR backlog is the warning — capability you can’t review is liability, not leverage.
- Structure over size.Nvidia’s own answer — blueprints, skills, a security runtime, specialized sub-agents — is the template for our own custom functions, scaled down and locked down.
All quotes are from Nader Khalil (Director of Developer Technologies, Nvidia) as reported in The New Stack, “‘An agent is an LLM and a harness’: What Nvidia really thinks about OpenClaw.” Our commentary and the build lessons are our own.
- Prompt routing — the model layer: pick the right model per task, measured.
- Harness review — benchmarking a harness (incl. the agent SDK itself) for cost-at-quality.
- Agentic tool use — how the loop calls tools, the harness’s core mechanic.
- Cloudflare Agents — an agent runtime + our router as the model brain, in practice.
- Routing hub — why the measured-quality layer is the thing routers lack.