Home / Guides / Gemini Rate Limits
Gemini Rate Limits Explained
Everything you need to know about Google Gemini rate limits - RPM, RPD, TPM caps per model, free vs paid tier quotas, and how to avoid silent 429s when working from the Gemini CLI.
Last updated April 2026 · By Soren Starck
What Are Gemini Rate Limits?
Gemini enforces limits on three axes: RPM (requests per minute), RPD (requests per day), and TPM (tokens per minute). Each model has its own per-tier caps.
Hit any of the three and Google returns a 429 Resource Exhausted with a retry-after. The Gemini CLI usually retries quietly, but during heavy multi-turn sessions you can stack failures and end up waiting longer than necessary.
Free vs Tier 1 Limits (Approximate)
| Model | Free RPM / RPD | Tier 1 RPM / RPD | Context |
|---|---|---|---|
| Gemini 2.5 Pro | 5 / 100 | 150 / 1,000 | 1M tokens |
| Gemini 2.5 Flash | 10 / 250 | 1,000 / 10,000 | 1M tokens |
| Gemini 2.5 Flash-Lite | 15 / 1,000 | 4,000 / 30,000 | 1M tokens |
Numbers shift as Google rebalances. Always cross-check the official docs. The pattern: Pro tightest, Flash middle, Flash-Lite most permissive.
What Happens When You Hit the Limit
- 429 errors - “Resource exhausted” with a retry-after
- CLI silent retries - your session feels stuck without explanation
- RPD lockout - once daily quota is gone, you wait until UTC midnight
- No native warning - neither AI Studio nor the CLI alerts you in real time
SessionWatcher429s break your flow.
SessionWatcher prevents them.
Native macOS menu bar app. Track Claude and Codex usage, costs, and rate limits in real-time.
“Fast, simple, and does exactly what it should. Definitely worth it.”
@nicojerome on GitHub
macOS 14+. $2.99 one-time purchase.

How to Monitor Gemini Rate Limits
SessionWatcher for Gemini watches your CLI activity live and surfaces the three caps in your menu bar.
- Live RPM gauge - see how close to throttle you are right now
- Daily request count against RPD cap
- Token-per-minute tracking for long-context calls
- Per-model breakdown - Pro vs Flash vs Flash-Lite
- Notifications at 80% / 95% so you can pace
Tips for Managing Gemini Rate Limits
- Pace RPM in long CLI sessions - Pro free tier maxes out fast.
- Drop to Flash for cheap turns - save Pro for the hard reasoning steps.
- Watch TPM on long contexts - a 500K-token request is cheap on RPM but expensive on TPM.
- Plan around RPD - daily caps reset on UTC midnight, not your local timezone.
- Upgrade when burn rate exceeds free tier - Tier 1 RPM is 30x for Pro and 100x for Flash.
Frequently Asked Questions
What are Gemini rate limits?
Three axes per model: RPM, RPD, TPM. Free Pro is the tightest (~5 RPM / 100 RPD). Tier 1 raises everything 30–100x.
How do I get past Gemini 429 errors?
Pace below RPM cap, switch to Flash, or upgrade. SessionWatcher shows live RPM so you can self-throttle.
Does the 1M context window change rate limits?
No. RPM and RPD count requests, not tokens. TPM is the cap to watch on long contexts.