Capacity, queue, and failover
When to use this page
Use this page to understand throttling, sticky sessions, failover, and model downgrade behavior.
Dynamic capacity
QuotaFlow estimates usable capacity from currently healthy upstream accounts. It excludes accounts that are inactive, unauthorized, disabled, temporarily unavailable, or in active cooldown.
Sticky routing
QuotaFlow prefers the same upstream account for a stable session id. Sticky routing can be bypassed when the selected account is under pressure or unavailable.
Account failover
If an upstream account returns a transient 429, QuotaFlow cools that account briefly and retries the request with another account when available.
If the upstream error is usage_limit_reached and includes a reset time, QuotaFlow cools the account until the reset window expires.
Queueing
When configured capacity is exhausted, QuotaFlow may queue requests for a bounded time instead of immediately failing. Queue behavior depends on the API key and pool configuration.
Model downgrade
Under high pressure, simple or standard tasks may be downgraded from gpt-5.5 to gpt-5.4. Complex tasks are preserved more conservatively and may only reduce reasoning effort.
/llms.txt, fetch /llms-full.txt for full context, and parse /openapi.yaml for endpoint schemas.