A model on its own just answers. To make it do something — edit a repo, run a terminal, drive a browser, loop toward a goal, or coordinate several models — you wrap it in a harness: the scripts, agents, and frameworks below. This is a curated map of the ones worth knowing, by category, with a plain-English “best for” on each. It's editorial, not a benchmark — we tell you what each is, not a score we didn't measure.
Why this matters here: the model is becoming a commodity; the harness is where the real work, value, and maintenance burden live. If you're choosing a model with the leaderboard, this is how you put it to work — and why the harness is the product.
Point an AI at your codebase from the shell — it reads, edits, runs tests, commits.
Anthropic's official terminal agent — reads your repo, edits files, runs commands, and can be scripted headless via `claude -p`.
Best for: Deep, reliable coding work when you're on the Claude stack; the headless mode is the backbone of scripted automation.
The git-native terminal pair-programmer — auto-commits every edit as its own commit, works with any model via BYOK.
Best for: A clean, auditable git history where each AI change is a reviewable commit.
Actively-developed open-source terminal coding agent with broad provider choice.
Best for: A current, provider-agnostic terminal agent for everyday coding without vendor lock-in.
Block's open-source local agent — a general terminal agent that goes beyond editing code to running broader tasks.
Best for: The closest "general local agent" substitute when you want more than a code editor.
Open-source autonomous software-engineering agent (formerly OpenDevin) — plans and executes multi-step dev tasks.
Best for: Autonomous task delegation on hard, multi-file engineering work (strong on SWE-bench with a capable model).
IDE-native agentic coding assistant (VS Code) — plans, edits across files, and runs commands with approval.
Best for: In-editor agentic coding for people who live in VS Code rather than the terminal.
Agents that pursue a goal over many steps with tools, memory, and less hand-holding.
An operator-style agent layer / marketplace around workflows and products — pitched for running AI operations, not just coding.
Best for: Operator-style workflows where you want a marketplace layer around products and automations.
Nous Research's self-improving CLI agent — persistent memory, automated skill creation, sandboxed code execution, and chat-surface reach (Telegram/Slack/Discord).
Best for: A persistent, memory-backed operator that lives across your messaging surfaces and improves its own skills.
Clean-room Python/Rust rewrite of the Claude Code architecture (built on oh-my-codex); born after the March 2026 Claude Code source leak.
Best for: An open, self-hostable take on the Claude Code architecture without the subscription.
Libraries for wiring models, tools, and steps into a controllable multi-agent workflow.
Graph-based orchestration for stateful, multi-step agent workflows — the most precise control over how steps and state flow.
Best for: Complex, stateful, production multi-agent workflows where you need explicit control and good debugging.
Role-playing multi-agent framework — you define a "crew" of agents with roles and let them collaborate on a task.
Best for: Getting a multi-agent collaboration running quickly with an intuitive team metaphor.
Anthropic's SDK for building production Claude-native agents — the same engine behind Claude Code, exposed for your own harnesses.
Best for: Anthropic-native production agents where you want the Claude Code loop under your own control.
Microsoft's multi-agent conversation framework with strong code-execution support.
Best for: Flexible multi-agent setups where agents write and run code as part of the loop.
TypeScript-first agent framework — the strongest TS SDK for building agents and workflows in a JS/TS stack.
Best for: Teams building agents in TypeScript who want a native SDK, not a Python port.
Visual / low-code platforms where an AI call is one node in a larger automation.
Small local tools that do the cheap work before a metered model call — cut the bill at $0.
Converts PDF/Word/PPT/Excel/HTML → clean Markdown locally before you feed a document to a model.
Best for: Cutting input tokens (and noise) by converting a document to clean text at $0 before the model sees it.
Prompt compression — strips non-essential tokens from a big prose blob (docs/RAG/notes) before a metered call. Lossy: measure accuracy per task first.
Best for: Shrinking a large, verbose context locally to cut the metered bill — after you've checked it doesn't hurt the answer.