Loading benchmark data…EyesInAI·Loading live benchmark data Trends
Week-over-week changes in latency, cost, and reliability across all tested models.
Recent regressions— pass rate dropped ≥20% in recent runs
anthropic-33%
claude-sonnet-4-5-20250929
💻 Code Gen · 5 runs
groq-100%
qwen3-32b
🔍 Context Recall · 4 runs
groq-100%
qwen3-32b
💻 Code Gen · 4 runs
groq-100%
qwen3-32b
{ } JSON Output · 4 runs
groq-100%
qwen3-32b
🧮 Reasoning · 4 runs
gemini-50%
gemini-2.5-pro
🔧 Tool Use · 4 runs
Recent improvements— pass rate rose ≥20% in recent runs
groq+100%
compound
🚀 Throughput · 3 runs
nvidia+100%
qwen3.5-397b-a17b
🚀 Throughput · 2 runs
nvidia+100%
qwen3.5-397b-a17b
🧮 Reasoning · 2 runs
All tracked (model × test) — 386
| Provider | Model | Test | Runs | Latency | $ / 1k | Pass | Trend |
|---|
| anthropic | claude-haiku-4-5-20251001 | 🔧 Tool Use | 7 | 1214ms | — | 100% | |
| anthropic | claude-haiku-4-5-20251001 | 🔍 Context Recall | 7 | 1042ms | — | 100% | |
| anthropic | claude-haiku-4-5-20251001 | 🚀 Throughput | 7 | 5488ms | — | 100% | |
| anthropic | claude-haiku-4-5-20251001 | 💻 Code Gen | 7 | 1992ms | — | 100% | |
| anthropic | claude-haiku-4-5-20251001 | { } JSON Output | 7 | 1000ms | — | 100% | |
| anthropic | claude-haiku-4-5-20251001 | 🧮 Reasoning | 7 | 1287ms | — | 100% | |
| anthropic | claude-haiku-4-5-20251001 | ⚡ Ping | 7 | 770ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | 🔧 Tool Use | 5 | 2015ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | 🔍 Context Recall | 5 | 2982ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | 🚀 Throughput | 5 | 11796ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | 💻 Code Gen | 5 | 3898ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | { } JSON Output | 5 | 1887ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | 🧮 Reasoning | 5 | 1830ms | — | 100% | |
| anthropic | claude-sonnet-4-6 | ⚡ Ping | 5 | 4297ms | — | 100% | |
| anthropic | claude-sonnet-4-5-20250929 | 🔧 Tool Use | 5 | 1726ms | — | 100% | |
| anthropic | claude-sonnet-4-5-20250929 | 🔍 Context Recall | 5 | 1906ms | — | 100% | |
| anthropic | claude-sonnet-4-5-20250929 | 🚀 Throughput | 5 | 11533ms | — | 100% | |
| anthropic | claude-sonnet-4-5-20250929 | 💻 Code Gen | 5 | 4135ms | — | 67% | |
| anthropic | claude-sonnet-4-5-20250929 | { } JSON Output | 5 | 1764ms | — | 100% | |
| anthropic | claude-sonnet-4-5-20250929 | 🧮 Reasoning | 5 | 2034ms | — | 100% | |
| anthropic | claude-sonnet-4-5-20250929 | ⚡ Ping | 5 | 1601ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | 🔧 Tool Use | 5 | 2744ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | 🔍 Context Recall | 5 | 3309ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | 🚀 Throughput | 5 | 14113ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | 💻 Code Gen | 5 | 5842ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | { } JSON Output | 5 | 2645ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | 🧮 Reasoning | 5 | 3111ms | — | 100% | |
| anthropic | claude-opus-4-1-20250805 | ⚡ Ping | 5 | 1978ms | — | 100% | |
| groq | qwen3-32b | 🔧 Tool Use | 4 | 487ms | $0.057 | 100% | |
| groq | qwen3-32b | 🔍 Context Recall | 4 | — | — | 0% | |
| groq | qwen3-32b | 🚀 Throughput | 4 | 1555ms | $0.143 | 100% | |
| groq | qwen3-32b | 💻 Code Gen | 4 | — | — | 0% | |
| groq | qwen3-32b | { } JSON Output | 4 | — | — | 0% | |
| groq | qwen3-32b | 🧮 Reasoning | 4 | — | — | 0% | |
| groq | qwen3-32b | ⚡ Ping | 4 | 236ms | <$0.01 | 100% | |
| groq | llama-4-scout-17b-16e-instruct | 🔧 Tool Use | 4 | 333ms | — | 100% | |
| groq | llama-4-scout-17b-16e-instruct | 🔍 Context Recall | 4 | 342ms | — | 100% | |
| groq | llama-4-scout-17b-16e-instruct | 🚀 Throughput | 4 | 1349ms | — | 100% | |
| groq | llama-4-scout-17b-16e-instruct | 💻 Code Gen | 4 | 569ms | — | 100% | |
| groq | llama-4-scout-17b-16e-instruct | { } JSON Output | 4 | 280ms | — | 100% | |
Showing 40 of 386 tracked combinations.