EyesInAI — AI Benchmark Leaderboard

⚡Ping

Latency & availability — single-word reply

275 ms

🧮Reasoning

Basic math reasoning — show work, give answer

433 ms

{ }JSON Output

Structured output compliance — valid JSON with required keys

422 ms

💻Code Gen

Python function generation with docstring

fail

🚀Throughput

Token generation speed — 500-token long-form response

1274 ms

🔍Context Recall

Retrieval from in-context data — 20-item list Q&A

441 ms

🔧Tool Use

Function/tool calling — get_weather invocation

341 ms

📝Summarization

Real-world: distill a news article into 3 bullet points

661 ms

🏷️Classification

Real-world: sentiment + category from customer review

402 ms

📄Data Extraction

Real-world: pull structured fields from an invoice

917 ms

📋Instruction Follow

Real-world: multi-rule compliance (5 sentences, no "the", etc.)

756 ms

✅Format Compliance

IFEval-style: 4 bullets, keyword inclusion/exclusion, no preamble

fail

🪡Long-Context Needle

Find a 6-digit code buried in ~3.5k tokens of filler text

fail

openai/gpt-oss-120b