The harness layer — tools that make AI act

Automation Scripts & Agents

A model on its own just answers. To make it do something — edit a repo, run a terminal, drive a browser, loop toward a goal, or coordinate several models — you wrap it in a harness: the scripts, agents, and frameworks below. This is a curated map of the ones worth knowing, by category, with a plain-English “best for” on each. It's editorial, not a benchmark — we tell you what each is, not a score we didn't measure.

Why this matters here: the model is becoming a commodity; the harness is where the real work, value, and maintenance burden live. If you're choosing a model with the leaderboard, this is how you put it to work — and why the harness is the product.

Terminal coding agents

Point an AI at your codebase from the shell — it reads, edits, runs tests, commits.

Claude Code

Anthropic's official terminal agent — reads your repo, edits files, runs commands, and can be scripted headless via `claude -p`.

Best for: Deep, reliable coding work when you're on the Claude stack; the headless mode is the backbone of scripted automation.

proprietary (subscription / API)Claude (Anthropic)official; the reference terminal agent

Aider

The git-native terminal pair-programmer — auto-commits every edit as its own commit, works with any model via BYOK.

Best for: A clean, auditable git history where each AI change is a reviewable commit.

Apache-2.0BYOK — any LLM~46k★ (mature; slower 2026 cadence)

opencode

Actively-developed open-source terminal coding agent with broad provider choice.

Best for: A current, provider-agnostic terminal agent for everyday coding without vendor lock-in.

open-sourceBYOK — many providersactive development, growing fast

Goose

Block's open-source local agent — a general terminal agent that goes beyond editing code to running broader tasks.

Best for: The closest "general local agent" substitute when you want more than a code editor.

Apache-2.0BYOK — many providers~32k★

OpenHands

Open-source autonomous software-engineering agent (formerly OpenDevin) — plans and executes multi-step dev tasks.

Best for: Autonomous task delegation on hard, multi-file engineering work (strong on SWE-bench with a capable model).

MITBYOK — pairs well with frontier modelshigh SWE-bench with Claude

Cline

IDE-native agentic coding assistant (VS Code) — plans, edits across files, and runs commands with approval.

Best for: In-editor agentic coding for people who live in VS Code rather than the terminal.

Apache-2.0BYOK — any providerone of the broadest-adopted IDE agents

Operator & autonomous agents

Agents that pursue a goal over many steps with tools, memory, and less hand-holding.

OpenClaw

An operator-style agent layer / marketplace around workflows and products — pitched for running AI operations, not just coding.

Best for: Operator-style workflows where you want a marketplace layer around products and automations.

open-sourcemulti-providerprominent operator-agent project

Hermes Agent

Nous Research's self-improving CLI agent — persistent memory, automated skill creation, sandboxed code execution, and chat-surface reach (Telegram/Slack/Discord).

Best for: A persistent, memory-backed operator that lives across your messaging surfaces and improves its own skills.

open-source300+ models across providersvery large following

Claw Code

Clean-room Python/Rust rewrite of the Claude Code architecture (built on oh-my-codex); born after the March 2026 Claude Code source leak.

Best for: An open, self-hostable take on the Claude Code architecture without the subscription.

open-sourcemulti-providerfastest repo to 100k★ in GitHub history

Orchestration frameworks

Libraries for wiring models, tools, and steps into a controllable multi-agent workflow.

LangGraph

Graph-based orchestration for stateful, multi-step agent workflows — the most precise control over how steps and state flow.

Best for: Complex, stateful, production multi-agent workflows where you need explicit control and good debugging.

MITmodel-agnosticlargest ecosystem; most production-mature

CrewAI

Role-playing multi-agent framework — you define a "crew" of agents with roles and let them collaborate on a task.

Best for: Getting a multi-agent collaboration running quickly with an intuitive team metaphor.

MITmodel-agnostic~5.2M downloads/mo

Claude Agent SDK

Anthropic's SDK for building production Claude-native agents — the same engine behind Claude Code, exposed for your own harnesses.

Best for: Anthropic-native production agents where you want the Claude Code loop under your own control.

proprietary (API)Claude (Anthropic)official; production-grade

AutoGen

Microsoft's multi-agent conversation framework with strong code-execution support.

Best for: Flexible multi-agent setups where agents write and run code as part of the loop.

MITmodel-agnosticestablished, research-heavy

Mastra

TypeScript-first agent framework — the strongest TS SDK for building agents and workflows in a JS/TS stack.

Best for: Teams building agents in TypeScript who want a native SDK, not a Python port.

open-sourcemodel-agnosticleading TS agent SDK

Workflow automation

Visual / low-code platforms where an AI call is one node in a larger automation.

n8n

Fair-code visual workflow automation — drag-and-drop nodes where an AI/LLM call is one step among many (APIs, databases, triggers).

Best for: Wiring AI into real business automations (triggers → AI step → actions) without writing the glue by hand.

fair-code (Sustainable Use License)any (via nodes / BYOK)huge self-host + cloud adoption

Local token-cutting tools

Small local tools that do the cheap work before a metered model call — cut the bill at $0.

MarkItDown

Converts PDF/Word/PPT/Excel/HTML → clean Markdown locally before you feed a document to a model.

Best for: Cutting input tokens (and noise) by converting a document to clean text at $0 before the model sees it.

MITn/a (pre-processing)a token-tax staple

LLMLingua

Prompt compression — strips non-essential tokens from a big prose blob (docs/RAG/notes) before a metered call. Lossy: measure accuracy per task first.

Best for: Shrinking a large, verbose context locally to cut the metered bill — after you've checked it doesn't hurt the answer.

MITn/a (pre-processing)strong measured savings on prose

On the numbers:star counts, download figures, and “maturity” notes are approximate signals from public data and move over time — read them as order-of-magnitude, not live. Licenses and model support can change; always check the source repo before adopting. Know a tool that belongs here? Tell us.

Explainers How agent frameworks work →Cut the cost of running them →

EyesInAI·Loading live benchmark data