Does the 1M-token context window change rate limits?

No. RPM and RPD count requests, not tokens. The 1M context just means each request can be expensive - TPM (tokens per minute) is the cap to watch when you send long contexts.

Gemini Rate Limits Explained – RPM, RPD, TPM & Free vs Paid Tiers

What Are Gemini Rate Limits?

Gemini enforces limits on three axes: RPM (requests per minute), RPD (requests per day), and TPM (tokens per minute). Each model has its own per-tier caps.

Hit any of the three and Google returns a 429 Resource Exhausted with a retry-after. The Gemini CLI usually retries quietly, but during heavy multi-turn sessions you can stack failures and end up waiting longer than necessary.

Free vs Tier 1 Limits (Approximate)

Model	Free RPM / RPD	Tier 1 RPM / RPD	Context
Gemini 2.5 Pro	5 / 100	150 / 1,000	1M tokens
Gemini 2.5 Flash	10 / 250	1,000 / 10,000	1M tokens
Gemini 2.5 Flash-Lite	15 / 1,000	4,000 / 30,000	1M tokens

Numbers shift as Google rebalances. Always cross-check the official docs. The pattern: Pro tightest, Flash middle, Flash-Lite most permissive.

What Happens When You Hit the Limit

429 errors - “Resource exhausted” with a retry-after
CLI silent retries - your session feels stuck without explanation
RPD lockout - once daily quota is gone, you wait until UTC midnight
No native warning - neither AI Studio nor the CLI alerts you in real time

SessionWatcher

429s break your flow.
SessionWatcher prevents them.

Native macOS menu bar app. Track Claude and Codex usage, costs, and rate limits in real-time.

★★★★★Trusted by developers daily

“Fast, simple, and does exactly what it should. Definitely worth it.”

@nicojerome on GitHub

Download Free

macOS 14+. 7-day Bundle trial. No credit card.

FinderFileEditViewGoWindowHelp

Mon Jan 1 12:00 AM

How to Monitor Gemini Rate Limits

SessionWatcher for Gemini watches your CLI activity live and surfaces the three caps in your menu bar.

Live RPM gauge - see how close to throttle you are right now
Daily request count against RPD cap
Token-per-minute tracking for long-context calls
Per-model breakdown - Pro vs Flash vs Flash-Lite
Notifications at 80% / 95% so you can pace

Tips for Managing Gemini Rate Limits

Pace RPM in long CLI sessions - Pro free tier maxes out fast.
Drop to Flash for cheap turns - save Pro for the hard reasoning steps.
Watch TPM on long contexts - a 500K-token request is cheap on RPM but expensive on TPM.
Plan around RPD - daily caps reset on UTC midnight, not your local timezone.
Upgrade when burn rate exceeds free tier - Tier 1 RPM is 30x for Pro and 100x for Flash.

Frequently Asked Questions

What are Gemini rate limits?

Three axes per model: RPM, RPD, TPM. Free Pro is the tightest (~5 RPM / 100 RPD). Tier 1 raises everything 30–100x.

How do I get past Gemini 429 errors?

Pace below RPM cap, switch to Flash, or upgrade. SessionWatcher shows live RPM so you can self-throttle.

Does the 1M context window change rate limits?

No. RPM and RPD count requests, not tokens. TPM is the cap to watch on long contexts.