Models

Model Benchmark

Sort by what matters: speed, cost, or accuracy. Real numbers from public benchmarks.

Model
Claude Opus 4.5
Anthropic
602.10s$15.00$75.0079.4%86.5%74.5%82.0%
Claude Sonnet 4.5
Anthropic
851.20s$3.00$15.0077.2%84.0%72.0%78.5%
GPT-5
OpenAI
1100.90s$1.25$10.0074.5%81.5%76.0%84.0%
Gemini 2.5 Pro
Google
951.50s$1.25$10.0067.0%76.5%70.0%81.0%
Grok 4
xAI
751.80s$5.00$15.0064.0%72.0%68.0%80.0%
DeepSeek V3.1
DeepSeek
601.00s$0.27$1.1053.0%65.5%60.0%75.5%
Gemini 2.5 Flash
Google
2200.40s$0.30$2.5050.0%62.0%58.0%76.0%
GPT-4o
OpenAI
1300.50s$2.50$10.0038.0%49.0%43.0%74.0%
Best score (SWE-Bench)
Claude Opus 4.5
79.4%
Fastest
Gemini 2.5 Flash
220 tok/s
Cheapest input
DeepSeek V3.1
$0.27/M
Largest context
Gemini 2.5 Pro
1000k

Sources: Artificial Analysis (throughput, TTFT), official SWE-Bench Verified leaderboard, Aider polyglot benchmark, LiveCodeBench, vendor pricing pages. Throughput numbers are medians across providers — your latency may differ based on region and load.