EyesInAI
Benchmarks
Models
Learn
Latest
AI Benchmark Intelligence
Loading benchmark data…
EyesInAI
·
Loading live benchmark data
Home
Search
Refresh
All models
anthropic
claude-haiku-4-5-20251001
Last tested Jun 7, 2026
Overall pass
93%
Avg latency
1730 ms
Context
200k
Tools
No
Input $/1M
$1.00
Output $/1M
$5.00
Tests run
14
Passed
13/14
Test results
⚡
Ping
Latency & availability — single-word reply
832 ms
🧮
Reasoning
Basic math reasoning — show work, give answer
1237 ms
{ }
JSON Output
Structured output compliance — valid JSON with required keys
1129 ms
💻
Code Gen
Python function generation with docstring
2198 ms
🚀
Throughput
Token generation speed — 500-token long-form response
5621 ms
🔍
Context Recall
Retrieval from in-context data — 20-item list Q&A
1197 ms
🔧
Tool Use
Function/tool calling — get_weather invocation
1071 ms
📝
Summarization
Real-world: distill a news article into 3 bullet points
1789 ms
🏷️
Classification
Real-world: sentiment + category from customer review
984 ms
📄
Data Extraction
Real-world: pull structured fields from an invoice
1389 ms
📋
Instruction Follow
Real-world: multi-rule compliance (5 sentences, no "the", etc.)
1809 ms
✅
Format Compliance
IFEval-style: 4 bullets, keyword inclusion/exclusion, no preamble
2111 ms
🪡
Long-Context Needle
Find a 6-digit code buried in ~3.5k tokens of filler text
1920 ms
🧠
Multi-Step Logic
BBH-lite: boolean expression + web of lies + object counting
fail
Compare with another model
See the leaderboard