Home / Guides / Gemini Rate Limits

Gemini Rate Limits Explained

Everything you need to know about Google Gemini rate limits - RPM, RPD, TPM caps per model, free vs paid tier quotas, and how to avoid silent 429s when working from the Gemini CLI.

Last updated April 2026 · By Soren Starck

What Are Gemini Rate Limits?

Gemini enforces limits on three axes: RPM (requests per minute), RPD (requests per day), and TPM (tokens per minute). Each model has its own per-tier caps.

Hit any of the three and Google returns a 429 Resource Exhausted with a retry-after. The Gemini CLI usually retries quietly, but during heavy multi-turn sessions you can stack failures and end up waiting longer than necessary.

Free vs Tier 1 Limits (Approximate)

ModelFree RPM / RPDTier 1 RPM / RPDContext
Gemini 2.5 Pro5 / 100150 / 1,0001M tokens
Gemini 2.5 Flash10 / 2501,000 / 10,0001M tokens
Gemini 2.5 Flash-Lite15 / 1,0004,000 / 30,0001M tokens

Numbers shift as Google rebalances. Always cross-check the official docs. The pattern: Pro tightest, Flash middle, Flash-Lite most permissive.

What Happens When You Hit the Limit

  • 429 errors - “Resource exhausted” with a retry-after
  • CLI silent retries - your session feels stuck without explanation
  • RPD lockout - once daily quota is gone, you wait until UTC midnight
  • No native warning - neither AI Studio nor the CLI alerts you in real time
SessionWatcher

429s break your flow.
SessionWatcher prevents them.

Native macOS menu bar app. Track Claude and Codex usage, costs, and rate limits in real-time.

★★★★★ 4.9/5 from developers
nicojerome

“Fast, simple, and does exactly what it should. Definitely worth it.”

@nicojerome on GitHub

Get SessionWatcher

macOS 14+. $2.99 one-time purchase.

How to Monitor Gemini Rate Limits

SessionWatcher for Gemini watches your CLI activity live and surfaces the three caps in your menu bar.

  • Live RPM gauge - see how close to throttle you are right now
  • Daily request count against RPD cap
  • Token-per-minute tracking for long-context calls
  • Per-model breakdown - Pro vs Flash vs Flash-Lite
  • Notifications at 80% / 95% so you can pace

Tips for Managing Gemini Rate Limits

  1. Pace RPM in long CLI sessions - Pro free tier maxes out fast.
  2. Drop to Flash for cheap turns - save Pro for the hard reasoning steps.
  3. Watch TPM on long contexts - a 500K-token request is cheap on RPM but expensive on TPM.
  4. Plan around RPD - daily caps reset on UTC midnight, not your local timezone.
  5. Upgrade when burn rate exceeds free tier - Tier 1 RPM is 30x for Pro and 100x for Flash.

Frequently Asked Questions

What are Gemini rate limits?

Three axes per model: RPM, RPD, TPM. Free Pro is the tightest (~5 RPM / 100 RPD). Tier 1 raises everything 30–100x.

How do I get past Gemini 429 errors?

Pace below RPM cap, switch to Flash, or upgrade. SessionWatcher shows live RPM so you can self-throttle.

Does the 1M context window change rate limits?

No. RPM and RPD count requests, not tokens. TPM is the cap to watch on long contexts.