OpenAI API 429 오류: 재시도, 할당량 확인, 에스컬레이션을 나누는 방법

OpenAI Platform API가 429를 반환하면 먼저 retry 횟수를 늘리지 말고 오류 본문을 읽어야 한다. rate limit이면 backoff, throttling, queue가 맞지만 quota, billing, project scope, model access, status incident, wrapper limit이면 확인 대상이 달라진다.

오류 단서	가능한 소유자	먼저 볼 것	retry 또는 stop
`rate limit reached`, `too many requests`, remaining headers near zero	request/token rate limit	headers, Limits, model family, reset window	backoff와 jitter 후 throttle 또는 queue
`You exceeded your current quota` 또는 `insufficient_quota`	quota, billing, spend cap	Billing, Usage, Limits, account state	account state 변경 전 stop
새 key도 동일하거나 일부 project/model만 실패	project, organization, model, key scope	project, organization, model access	scope 수정 후 traffic 변경
여러 call 실패와 Status incident	platform status/capacity	OpenAI Status, timestamp, request id	대기, 증거 보존
ChatGPT, Codex, Sora, Azure, wrapper 오류	wrong surface	product surface, provider docs, route, headers	해당 계약으로 분리

중지 규칙은 단순하다. request/token pressure이고 reset signal이 있을 때만 retry한다. quota, billing, wrong project, wrong model, wrong surface, status incident가 보이면 같은 요청 반복은 해결책이 아니다.

429 본문을 먼저 읽고 코드를 바꾼다

OpenAI official docs distinguish at least two 429 families: traffic arriving too fast and current quota exhausted. In local developer searches these usually collapse into one phrase, so the body of the error must own the first decision. Record message, code, type, endpoint, model, project, organization, timestamp, and request id before changing retry policy.

The safe classification is conservative. If insufficient_quota or current quota wording appears, treat it as a quota or billing stop. If the body says rate limit or too many requests and the headers show remaining/reset information, treat it as retryable pressure. If neither branch is clear, hold the route steady and collect evidence instead of changing five variables at once.

This matters in real operations because an ambiguous 429 can tempt teams into the wrong repair. A tight retry loop can consume more minute capacity. A new key can hide the fact that the same project is still blocked. A billing change in the wrong organization can leave the production request untouched.

10분 안에 복구 경로를 정한다

OpenAI API 429 10분 진단 흐름

Use the first ten minutes to classify the owner, not to experiment randomly. Copy the raw body and headers, open Limits, Billing and Usage for the same project and organization, confirm the model family, check OpenAI Status, then send one smaller controlled request. That sequence keeps the evidence readable.

Time	Action	What it proves
0-1	Save body and headers	whether the branch is rate, quota, billing or unknown
1-3	Check Limits, Usage and Billing	whether the account has capacity or billing state problems
3-5	Compare model and endpoint	whether a stricter or shared model family is involved
5-7	Check OpenAI Status	whether a public incident changes the response
7-10	Send one smaller controlled request	whether workload size or account state is likely

If the smaller request works, investigate concurrency, token size, image throughput or fan-out. If it fails with quota wording, stop retrying. If unrelated endpoints fail during a declared incident, preserve evidence and wait rather than rotating accounts.

retry와 backoff가 맞는 경우

Retry and backoff are correct only for temporary request or token pressure. The useful signals are rate-limit wording, low remaining values, reset timing, and a traffic pattern that exceeds the current project/model budget. Retry is not a magic repair; it is a pacing tool.

Use exponential backoff with jitter, cap retry count, and add a central limiter per project and model family. A limiter inside each worker is not enough if workers do not share state. Estimate token size before dispatch, because reducing prompt size or max output can remove TPM pressure before the API rejects the request.

Failed requests can still count toward minute limits. A fleet that retries every second can keep itself inside the failure window. A good system slows down, queues, sheds non-urgent work, or uses Batch for async work.

계속 retry하면 안 되는 경우

Retry is wrong when the error points to insufficient_quota, current quota, billing, monthly spend or account state. Waiting a few seconds does not add quota. The correct path is Billing, Usage, Limits, spend cap, organization, project and model access.

Many "I have credits but still get 429" cases are scope problems. The credit can be in another organization. The request can use another project. A monthly spend cap can be active. A model can be unavailable to that project. A wrapper can be applying its own pool. Keep one minimal request stable while checking each scope.

새 API key가 해결책이 아닐 수 있는 이유

An API key is not a separate capacity bucket. A new key helps when the old key is revoked, leaked, restricted or attached to the wrong project. It does not create capacity if organization, project, model family and billing owner remain the same.

Scope	Check	Failure pattern
Organization	request uses the intended org	personal and team orgs have different billing or limits
Project	key belongs to the inspected project	Limits checked in one project, traffic sent from another
Model family	selected model has access and headroom	stricter or shared family limit is exhausted
Team workload	other services share capacity	batch job or another app consumes the pool

If one model fails, test a small request to a model the project can access. If every model fails with quota wording, inspect account state first. If the key works elsewhere, inspect concurrency and request shape in the failing service.

headers와 Limits를 실시간 증거로 본다

OpenAI API 429 headers와 Limits 증거 지도

The live evidence is the response plus the account. The body gives the branch. Headers can show limit, remaining and reset timing. The Limits page gives the current project, organization and model context. Any static table is weaker than the reader's own live evidence.

Evidence	Why it matters
status and body	separates retryable rate pressure from quota or billing
request id	gives support a lookup handle
rate-limit headers	shows limit, remaining and reset timing
project and organization	confirms who owns the request
model and endpoint	exposes stricter model or wrong endpoint
Limits and Usage state	records account state during failure
Status snapshot	separates incident from account-local failure

On 2026-04-29 the public OpenAI Status check did not show a broad active incident. That does not guarantee future health. During an incident, check Status live; if it is green, continue through account scope, headers and workload shape.

production에서 다음 429를 줄인다

OpenAI API 429 mitigation ladder와 support packet

After the immediate recovery, move the lesson into production controls. The application should know its budget before OpenAI rejects it: project/model limiters, tenant budgets, token estimates, queue alerts, retry counters and reset-window observations.

Interactive traffic and background jobs should not compete blindly. Queue non-urgent jobs. Split tenants. Reduce prompt size when possible. Route simpler work to cheaper or lower-pressure models when that is an approved product decision. Use Batch when latency is not urgent and the workload fits.

먼저 wrong surface를 분리한다

"OpenAI API 429" should mean a Platform API call made by code. ChatGPT, Codex, Sora, Azure OpenAI and wrappers can show limit messages too, but the owner and fix are different.

Surface	Do not assume	Check instead
ChatGPT	consumer plan changes API quota	ChatGPT product limits and account state
Codex	coding-agent limits equal API RPM/TPM	Codex product contract and status
Sora	video capacity equals text API limits	Sora route, queue, plan and video status
Azure OpenAI	OpenAI Platform Limits owns deployment	Azure quota, deployment, region and subscription
Wrapper	OpenAI headers always pass through	provider dashboard, docs, route id and upstream evidence

If the request is not sent directly to api.openai.com, identify the provider boundary first. The wrapper may be full, may translate an upstream 429, or may enforce its own account cap.

증거를 모아 에스컬레이션한다

Escalate only after the branch is stable and secrets are removed. A compact packet should include timestamp, timezone, request id, endpoint, model, SDK version, organization, project, billing owner, safe body, safe headers, Limits and Usage state, Status state, retry count, concurrency, prompt size, queue depth and recent changes.

Do not post API keys, bearer tokens, card details, private prompts or user data in public places. Clean evidence is faster for support and safer for users.

자주 묻는 질문

OpenAI API 429는 모두 retry 가능한가요?

아니다. 오류 본문과 headers가 일시적인 request/token pressure를 가리킬 때만 retry한다. insufficient_quota는 Billing, Usage, Limits, project, organization, model access를 본다.

`insufficient_quota`는 무엇인가요?

quota, billing, spend cap, account state 문제다. 몇 초 기다리는 것으로 해결되지 않으므로 같은 project/org에서 확인한다.

크레딧이 있는데 왜 429가 나오나요?

다른 organization/project, spend cap, billing 반영 지연, model access, shared family limit, wrapper pool이 원인일 수 있다.

API key를 여러 개 만들면 제한이 늘어나나요?

같은 project/org라면 늘어나지 않는다. 새 key는 credential 문제를 고치지만 quota pool을 새로 만들지 않는다.

어떤 headers를 봐야 하나요?

limit, remaining, reset을 보여주는 rate-limit headers다. Limits 페이지와 함께 읽어야 한다.

OpenAI Status를 확인해야 하나요?

그렇다. incident가 있으면 기다리고 증거를 보존한다. green이면 account, headers, Limits, workload를 계속 확인한다.

ChatGPT Plus와 API quota는 같은가요?

아니다. ChatGPT 소비자 구독과 OpenAI Platform API billing은 별도다.

지원팀에는 무엇을 보내나요?

timestamp, timezone, request id, endpoint, model, project, organization, error body, safe headers, Limits/Usage, Status, retry count, workload, recent changes를 보낸다.