Models
Model Benchmark
Sort by what matters: speed, cost, or accuracy. Real numbers from public benchmarks.
| Model | ||||||||
|---|---|---|---|---|---|---|---|---|
Claude Opus 4.5 Anthropic | 60 | 2.10s | $15.00 | $75.00 | 79.4% | 86.5% | 74.5% | 82.0% |
Claude Sonnet 4.5 Anthropic | 85 | 1.20s | $3.00 | $15.00 | 77.2% | 84.0% | 72.0% | 78.5% |
GPT-5 OpenAI | 110 | 0.90s | $1.25 | $10.00 | 74.5% | 81.5% | 76.0% | 84.0% |
Gemini 2.5 Pro Google | 95 | 1.50s | $1.25 | $10.00 | 67.0% | 76.5% | 70.0% | 81.0% |
Grok 4 xAI | 75 | 1.80s | $5.00 | $15.00 | 64.0% | 72.0% | 68.0% | 80.0% |
DeepSeek V3.1 DeepSeek | 60 | 1.00s | $0.27 | $1.10 | 53.0% | 65.5% | 60.0% | 75.5% |
Gemini 2.5 Flash Google | 220 | 0.40s | $0.30 | $2.50 | 50.0% | 62.0% | 58.0% | 76.0% |
GPT-4o OpenAI | 130 | 0.50s | $2.50 | $10.00 | 38.0% | 49.0% | 43.0% | 74.0% |
Best score (SWE-Bench)
Claude Opus 4.5
79.4%
Fastest
Gemini 2.5 Flash
220 tok/s
Cheapest input
DeepSeek V3.1
$0.27/M
Largest context
Gemini 2.5 Pro
1000k
Sources: Artificial Analysis (throughput, TTFT), official SWE-Bench Verified leaderboard, Aider polyglot benchmark, LiveCodeBench, vendor pricing pages. Throughput numbers are medians across providers — your latency may differ based on region and load.