Models

Model Benchmark

Sort by what matters: speed, cost, or accuracy. Real numbers from public benchmarks.

Model
Claude Opus 4.5 Anthropic	60	2.10s	$15.00	$75.00	79.4%	86.5%	74.5%	82.0%
Claude Sonnet 4.5 Anthropic	85	1.20s	$3.00	$15.00	77.2%	84.0%	72.0%	78.5%
GPT-5 OpenAI	110	0.90s	$1.25	$10.00	74.5%	81.5%	76.0%	84.0%
Gemini 2.5 Pro Google	95	1.50s	$1.25	$10.00	67.0%	76.5%	70.0%	81.0%
Grok 4 xAI	75	1.80s	$5.00	$15.00	64.0%	72.0%	68.0%	80.0%
DeepSeek V3.1 DeepSeek	60	1.00s	$0.27	$1.10	53.0%	65.5%	60.0%	75.5%
Gemini 2.5 Flash Google	220	0.40s	$0.30	$2.50	50.0%	62.0%	58.0%	76.0%
GPT-4o OpenAI	130	0.50s	$2.50	$10.00	38.0%	49.0%	43.0%	74.0%

Best score (SWE-Bench)

Claude Opus 4.5

79.4%

Fastest

Gemini 2.5 Flash

220 tok/s

Cheapest input

DeepSeek V3.1

$0.27/M

Largest context

Gemini 2.5 Pro

1000k

Sources: Artificial Analysis (throughput, TTFT), official SWE-Bench Verified leaderboard, Aider polyglot benchmark, LiveCodeBench, vendor pricing pages. Throughput numbers are medians across providers — your latency may differ based on region and load.