Two layers everyone is racing to own
Strip away the branding and the new vendor products fall into the same two buckets every serious AI app needs:
- Model selection — given a prompt, which model should answer it? This is the unsolved-quality layer prompt routing is about. It is also exactly what our measured routing does.
- Agent runtime — durable execution, sandboxing, approvals, connections, channels, observability. The plumbing agent frameworks package so you write what an agent does, not everything it needs to run.
Layer 1 — the vendors now route, but only over their own shelf
The newest first-party model-selection products are real and worth knowing — but they share two limits that matter for an honest recommendation.
A single meta-endpoint that automatically picks the Gemini model — Pro, Flash, etc. — that best meets a cost/quality preference. Genuinely useful if you are an all-Gemini shop. But it routes only across Google’s own models, on an internal estimate you can’t inspect, and it is still labelled experimental.
Uses a small classifier (GPT-4.1-mini, upgradeable to GPT-5) to route a request to the right sub-agent— intent routing, not cheapest-capable-model routing — and stays inside OpenAI’s models. Note the visual Agent Builder is being wound down (shutting Nov 30 2026); OpenAI points builders at the code-first Agents SDK instead.
Lets you override the model per role via environment variables (ANTHROPIC_DEFAULT_SONNET_MODEL / …OPUS_MODEL) and route everything through a gateway for cross-vendor reach — but there is no built-in quality routing. The tiering is manual: exactly the decision our config automates and measures.
The cross-vendor option — and now confirmed to be powered by Not Diamondunder the hood. So OpenRouter Auto and Not Diamond are one opaque engine, not two: the decision is stochastic, you can’t audit why it chose, and you can’t reproduce the path.
The structural gap. Every provider-native router that shipped this year is either single-vendor (it can only ever recommend its own models) or opaque(you can’t see why, or check it on your own tasks). A vendor is structurally conflicted — it has no incentive to tell you a competitor’s model is the better, cheaper answer. That is the one thing a measured, cross-vendor, inspectable signal does that they can’t. See the full landscape in the Task Routing market table.
Layer 2 — the agent runtime has commoditized
On the runtime side the picture is the opposite of a gap: the whole field shipped the same six primitives we described in agent frameworks — durable execution, sandboxing, approvals, secure connections, multi-channel, tracing.
The most ambitious infrastructure play — now positioned as a runtime any agent framework can build on: durable fibers with crash recovery, sub-agents with isolated SQLite, sessions with forking and compaction, built-in observability — at the edge. Important enough that we give it its own write-up.
A code-first kit (7M+ downloads) plus a managed runtime, with self-healing tool use (auto-retries a failed tool differently), a built-in eval layer with a user simulator, and prompt-injection screening. The most enterprise-complete bundle.
The directory-as-agent framework behind 100+ of Vercel’s production agents — the clean illustration of the pattern we already reviewed.
The takeaway flips the usual build-vs-buy instinct: durable execution, sandboxes and approvals are now off-the-shelf. Hand-rolling that plumbing is increasingly hard to justify. But notice what every one of these runtimes still hardcodes — the model. Eve’s agent.ts pins a string; ADK and the rest do the same. The runtime is solved; the choice of model inside it is not.
What we take from the scan
- For model selection, the measured/cross-vendor angle held up. The field consolidated on opaque, single-vendor routing — which makes an inspectable, vendor-neutral signal more differentiated, not less.
- For the runtime, adopt rather than rebuild. The honest move is to let a commodity runtime (Cloudflare, ADK) own the plumbing and wire a measured model-selection layer inside it — the input these frameworks lack.
- For a client, the recommendation is conditional. An all-one-vendor shop is well served by that vendor’s native optimizer. The moment you want to weigh a competitor’s model, audit a cost/quality decision, or defend a routing choice with numbers — that is where a neutral, measured signal wins, because no vendor can offer it without conflict.