EyesInAI
Benchmarks
Models
Learn
Latest
AI Benchmark Intelligence
Loading benchmark data…
EyesInAI
·
Loading live benchmark data
Home
Search
Refresh
All models
groq
llama-3.3-70b-versatile
Last tested Jun 6, 2026
Overall pass
100%
Avg latency
573 ms
Context
131k
Tools
Yes
Input $/1M
$0.59
Output $/1M
$0.79
Tests run
11
Passed
11/11
Test results
⚡
Ping
Latency & availability — single-word reply
364 ms
🧮
Reasoning
Basic math reasoning — show work, give answer
443 ms
{ }
JSON Output
Structured output compliance — valid JSON with required keys
435 ms
💻
Code Gen
Python function generation with docstring
677 ms
🚀
Throughput
Token generation speed — 500-token long-form response
1766 ms
🔍
Context Recall
Retrieval from in-context data — 20-item list Q&A
457 ms
🔧
Tool Use
Function/tool calling — get_weather invocation
310 ms
📝
Summarization
Real-world: distill a news article into 3 bullet points
558 ms
🏷️
Classification
Real-world: sentiment + category from customer review
275 ms
📄
Data Extraction
Real-world: pull structured fields from an invoice
441 ms
📋
Instruction Follow
Real-world: multi-rule compliance (5 sentences, no "the", etc.)
579 ms
Compare with another model
See the leaderboard