OpenAI API Quota Exceeded or 429: Retry, Check Billing, or Escalate

If an OpenAI Platform API call says quota exceeded or returns 429, do not add retries until you read the error body. A retryable rate-limit error needs backoff, throttling, or queueing; insufficient_quota, billing, project scope, model access, status, and wrapper limits need different checks and can get worse if you keep looping.

Error clue	Likely owner	Check first	Retry or stop
`rate limit reached`, `too many requests`, or remaining headers near zero	Request or token rate limit	Response headers, Limits page, model family, reset window	Retry with backoff and jitter, then throttle or queue
`You exceeded your current quota` or `insufficient_quota`	Quota, billing, or spend cap	Billing, Usage, Limits, monthly spend, account state	Stop retrying until the account state changes
A new key fails the same way, or only one project/model fails	Project, organization, model, or key scope	Selected project, organization, model access, shared family limit	Fix the scope before changing traffic
Many calls fail while OpenAI Status shows an incident	Platform status or capacity event	OpenAI Status, timestamp, request id, affected endpoint	Wait, preserve evidence, and avoid account churn
The error comes from ChatGPT, Codex, Sora, Azure OpenAI, or a wrapper	Wrong surface or provider-owned limit	Product surface, provider docs, API route, headers	Route to that contract before applying Platform API fixes

The stop rule is simple: retry only after the owner is request or token pressure and you have a reset signal. If the body points to quota, billing, wrong project, wrong model, wrong surface, or a status incident, repeating the same request is not a fix; capture the error body, request id, headers, project, organization, model, Limits page state, and status page state instead.

Read the 429 body before you change code

OpenAI documents two very different 429 messages in its API error codes guide: one says the rate limit was reached because requests are arriving too quickly, and another says the current quota has been exceeded. Those messages can share the same HTTP status, but they do not share the same repair path.

The first branch is traffic pressure. Your application may be sending too many requests per minute, too many tokens per minute, too many requests per day, or too much image traffic for the selected model and project. That branch can be handled with backoff, jitter, request shaping, queues, and lower-concurrency work.

The second branch is account state. insufficient_quota, current-quota wording, monthly spend exhaustion, expired billing, or a project that cannot access a model should not be treated as a transient network blip. A retry loop will usually burn more failed attempts, make logs noisier, and hide the real fix: billing, usage, project selection, organization selection, or model access.

In code, classify the body before deciding whether the request enters retry logic:

hljs ts
function classifyOpenAI429(error: any) {
  const status = error?.status || error?.response?.status;
  const code = error?.error?.code || error?.code;
  const type = error?.error?.type || error?.type;
  const message = String(error?.error?.message || error?.message || "");

  if (status !== 429) return "not_429";
  if (code === "insufficient_quota" || type === "insufficient_quota") {
    return "quota_or_billing_stop";
  }
  if (/exceeded your current quota/i.test(message)) {
    return "quota_or_billing_stop";
  }
  if (/rate limit|too many requests/i.test(message)) {
    return "retryable_rate_limit_check_headers";
  }
  return "unknown_429_collect_evidence";
}

That classifier is intentionally conservative. If the body is not clear, do not guess. Keep one request path stable and inspect the headers, Limits page, selected project, organization, model, and status page before changing multiple variables at once.

The 10-minute recovery board

Ten-minute diagnosis flow for OpenAI API 429

The fastest recovery path is a fixed sequence, not a bag of random fixes. Start by copying the full error body exactly as returned. Save the HTTP status, error type, error code, request id if present, endpoint, model, project, organization, and timestamp with timezone. If your logs only say "429 Too Many Requests," improve logging before retrying because the missing body is the diagnosis.

Next, check whether the same request is failing because of rate pressure or account state. Open the current Limits view for the organization and project that the request is actually using. Then compare that account state with the response headers and the model family. OpenAI's rate limits guide says limits can be measured across requests, tokens, days, and images, and they can be scoped by organization, project, and model. That means a static number from an old blog post is weaker than the live evidence in your own account.

Use this short sequence:

Minute	Action	What it tells you
0-1	Capture the raw body and headers	Whether the error names rate limit, quota, or another owner
1-3	Open Limits, Usage, and Billing for the same project and organization	Whether capacity, spend, or account state is the blocker
3-5	Compare model, endpoint, and model family	Whether the selected model has a stricter or shared limit
5-7	Check OpenAI Status	Whether there is a declared broad platform problem
7-10	Send one smaller controlled request	Whether the issue follows workload size, concurrency, or account state

If the smaller controlled request succeeds, you likely have a workload-shaping problem: token size, request burst, concurrency, or image throughput. If it fails with the same quota wording, you likely have an account-state problem. If several unrelated endpoints fail while the status page shows a relevant incident, preserve evidence and avoid account churn.

When retry and backoff are correct

Retry is correct when the evidence points to temporary request or token pressure. The usual signs are rate-limit wording, low remaining-request or remaining-token headers, a reset window in the headers, and a workload pattern that sends bursts faster than your allowed limit. The goal is not to "try harder." The goal is to send fewer requests at a better pace.

OpenAI's Help Center article on 429 Too Many Requests recommends exponential backoff for rate-limit errors, and OpenAI's Cookbook example shows retry handling with randomized exponential backoff. Use jitter so all workers do not wake at the same instant. Cap retries so a degraded worker cannot keep hammering the API forever.

Good retry behavior usually includes:

Control	Why it matters
Exponential backoff with jitter	Spreads retries across the reset window instead of creating a retry storm
A maximum retry count	Prevents one request from consuming worker capacity indefinitely
Per-project and per-model concurrency caps	Matches the scope that OpenAI actually limits
Token budgeting before the request	Reduces TPM pressure before the API rejects the call
Queueing instead of synchronous fan-out	Turns spikes into scheduled work
Batch for async work when appropriate	Moves non-urgent workloads away from synchronous pressure

Also remember the uncomfortable part: unsuccessful requests can still count toward per-minute limits. If twenty workers all retry a failing request every second, the application can keep itself inside the failure window. A correct retry policy slows down, sheds work, or queues. It does not turn one 429 into hundreds.

When retry is wrong

Retry is wrong when the error says current quota, insufficient_quota, billing, monthly spend, or account state. That branch is not solved by waiting a few seconds. It is solved by checking the Billing and Usage views, confirming the selected organization and project, reviewing spend caps, and making sure the account is eligible to use the model and endpoint you are calling.

This is the branch behind many confusing reports such as "I have credits but still get 429" or "my first request returns 429." Credits can exist in one account while the request is sent from another organization or project. A monthly spend cap can be hit even when a payment method is present. A new project can have lower limits than the one you intended to use. A model can be unavailable to that project or can share a family limit with another workload.

Use one stable request while checking account state. Do not rotate keys, change models, change projects, change billing settings, and alter retry code in the same pass. If the quota wording remains, your best next move is evidence collection and account repair, not more retries.

Why a new API key may not help

An API key is not a private rate-limit bucket. If the same organization, project, model family, or billing owner remains in control, a new key can fail exactly like the old one. Creating keys can be useful when the old key was revoked, mis-scoped, leaked, or attached to the wrong project, but it does not create new capacity by itself.

Check these four scope layers before assuming the key is the problem:

Scope layer	Check	Common failure
Organization	The request uses the intended org	Personal org and team org have different billing or limits
Project	The key belongs to the project you inspected	You checked Limits in one project while traffic uses another
Model or model family	The selected model has access and headroom	A stricter model or shared family limit is exhausted
Team workload	Other services share the same capacity	A batch job or another app consumed the pool

If only one model fails, test a small request against a model you know the project can access. If every model and endpoint fails with quota wording, inspect Billing and Usage before writing retry code. If only one service fails but the same key works elsewhere, inspect that service's concurrency and request size.

Use headers and Limits as live evidence

Response headers and Limits page evidence map for OpenAI API 429

The live evidence for an OpenAI API 429 lives in two places: the response and the account. The response body tells you the branch. The headers can show request and token limit, remaining capacity, and reset timing. The Limits page shows the current limits for the organization, project, and model context. Together they are stronger than any universal public table.

Do not paste API keys, bearer tokens, or full private payloads into tickets or forum posts, but do preserve safe operational evidence:

Evidence	Use
HTTP status and error body	Separates retryable rate pressure from quota or billing
Request id, if present	Gives support an exact lookup handle
Rate-limit headers	Shows limit, remaining value, and reset timing
Project and organization names or ids	Confirms the scope that owns the request
Model and endpoint	Reveals stricter model limits or wrong endpoint use
Limits and Usage screenshots	Shows live account state at the time of failure
OpenAI Status snapshot	Separates declared incidents from account-local failures

On April 29, 2026, the public OpenAI Status page showed no declared broad active incident during a public status check. That sentence is not a permanent guarantee. Treat status as a live branch during your own incident: if the relevant API component is degraded, wait and preserve evidence; if status is green, continue through account scope, headers, and workload shape.

Prevent the next 429 in production

Mitigation ladder and support packet for OpenAI API 429

After the immediate incident is stable, move the fix out of human debugging and into production controls. The best prevention is to make your application know its own budget before OpenAI has to reject it. That means request shaping, token budgeting, tenant-level quotas, queue depth alerts, and model-specific concurrency gates.

For request bursts, use a central limiter per project and model family. Per-worker limiters are useful only if workers share state; otherwise each worker thinks it is within the budget while the fleet exceeds it. For token-heavy workloads, estimate prompt and output size before dispatch. A smaller prompt, shorter max output, or cheaper routing model can prevent TPM pressure without changing user-visible behavior.

For asynchronous work, stop treating every task as a synchronous user request. Queue jobs, smooth spikes, and consider OpenAI Batch when latency is not urgent. Batch is not a magic bypass for every application, but it is a better fit for non-interactive workloads than having many workers compete with user-facing calls.

For multi-tenant applications, keep tenant budgets separate. One noisy customer should not exhaust the whole project's minute window. Record the tenant id, model, token estimate, queue delay, retry count, and final outcome for every 429. Without those fields, your next incident review will still be guesswork.

Route out wrong surfaces

"OpenAI API error 429" should mean the OpenAI Platform API request made by your code. It should not automatically mean ChatGPT message limits, Codex product limits, Sora video capacity, Azure OpenAI quota, or an OpenAI-compatible wrapper's private limit. Those surfaces can also show limit messages, but the owner and repair path changes.

Use this split before applying Platform API advice:

Surface	Do not assume	Check instead
ChatGPT web or mobile	ChatGPT Plus or team access changes API quota	ChatGPT product limits and account state
Codex product surface	A coding-agent limit is the same as API RPM or TPM	Codex usage contract and current product status
Sora or video generation	Video capacity maps to text API limits	Sora route, plan, queue, and video-specific status
Azure OpenAI	OpenAI Platform Limits page owns the deployment	Azure quota, deployment, region, and subscription
Wrapper or gateway	OpenAI headers always pass through untouched	Provider dashboard, provider docs, upstream headers, and route id

If you are not calling api.openai.com directly, identify the provider boundary first. A wrapper may return 429 because its own pool is full, because it mapped an upstream OpenAI failure into a local error, or because your account on that provider hit a plan cap. Applying OpenAI Platform fixes without that boundary can waste time.

Escalate with evidence

Escalation is useful after you have separated the branch and stopped changing variables. A support packet should be short, reproducible, and free of secrets. It should show what failed, where it failed, which account scope owned the request, and why you believe retry, billing, scope, status, or provider routing is the remaining owner.

Include:

Item	Detail to include
Timestamp	Date, local time, and timezone
Request identity	Request id if present, endpoint, model, and SDK version
Account scope	Organization, project, billing owner, and relevant model family
Error evidence	Full error body, safe headers, and retry count
Limits evidence	Limits page state, Usage state, and any spend cap
Status evidence	OpenAI Status state at the time of failure
Workload shape	Concurrency, prompt size, expected output size, queue depth
Recent changes	New key, new project, billing change, model change, deploy, provider route change

Redact secrets before sharing. Do not include bearer tokens, full API keys, private user data, card details, or proprietary prompts unless support specifically provides a secure route and the data is necessary. A clean packet is faster for support and safer for your users.

FAQ

Is every OpenAI API 429 retryable?

No. Retry is appropriate when the error body and headers point to temporary request or token pressure. If the body says insufficient_quota or current quota exceeded, stop retrying and inspect Billing, Usage, Limits, project, organization, and model access.

What does `insufficient_quota` mean?

It means the request is blocked by quota, billing, spend, or account state rather than a short-lived request burst. The exact owner can vary, so check the account pages tied to the same organization and project that sent the request.

Why do I get 429 even though I added credits?

The request may be using a different organization or project, a monthly spend cap may still be active, the billing state may not have propagated, the model may have its own access or family limit, or a provider wrapper may be enforcing its own pool. Verify the route before retrying.

Do multiple API keys increase my OpenAI rate limit?

Not by themselves. If the keys belong to the same project and organization, they usually share the same capacity owner. A new key fixes revoked, leaked, or wrong-project credentials; it does not create a new quota pool by magic.

Which headers should I inspect?

Inspect rate-limit headers that show limit, remaining capacity, and reset timing for requests and tokens when they are present in the response. Pair those headers with the current Limits page because account-specific limits can change and can differ by model family.

Should I check OpenAI Status for 429?

Yes, but status is only one branch. If the status page shows a relevant incident, wait and preserve evidence. If it is green, continue checking the error body, headers, Limits, Billing, project, organization, model, and provider surface.

Is ChatGPT Plus quota the same as OpenAI API quota?

No. ChatGPT consumer plans and OpenAI Platform API billing are separate surfaces. A ChatGPT subscription does not automatically explain or repair a Platform API 429 from your code.

What should I send to support?

Send the timestamp, timezone, request id if available, endpoint, model, project, organization, safe error body, safe headers, Limits and Usage state, OpenAI Status state, retry count, workload shape, and recent account or deploy changes. Redact API keys and private data.