EyesInAI — AI Benchmark Leaderboard

⚡Ping

Latency & availability — single-word reply

245 ms

🧮Reasoning

Basic math reasoning — show work, give answer

518 ms

{ }JSON Output

Structured output compliance — valid JSON with required keys

fail

💻Code Gen

Python function generation with docstring

fail

🚀Throughput

Token generation speed — 500-token long-form response

1665 ms

🔍Context Recall

Retrieval from in-context data — 20-item list Q&A

fail

🔧Tool Use

Function/tool calling — get_weather invocation

538 ms

📝Summarization

Real-world: distill a news article into 3 bullet points

fail

🏷️Classification

Real-world: sentiment + category from customer review

fail

📄Data Extraction

Real-world: pull structured fields from an invoice

fail

📋Instruction Follow

Real-world: multi-rule compliance (5 sentences, no "the", etc.)

fail

qwen/qwen3-32b