QuotaFlow Docs

Capacity, queue, and failover

When to use this page

Use this page to understand throttling, sticky sessions, failover, and model downgrade behavior.

Dynamic capacity

QuotaFlow estimates usable capacity from currently healthy upstream accounts. It excludes accounts that are inactive, unauthorized, disabled, temporarily unavailable, or in active cooldown.

Sticky routing

QuotaFlow prefers the same upstream account for a stable session id. Sticky routing can be bypassed when the selected account is under pressure or unavailable.

Account failover

If an upstream account returns a transient 429, QuotaFlow cools that account briefly and retries the request with another account when available.

If the upstream error is usage_limit_reached and includes a reset time, QuotaFlow cools the account until the reset window expires.

Queueing

When configured capacity is exhausted, QuotaFlow may queue requests for a bounded time instead of immediately failing. Queue behavior depends on the API key and pool configuration.

Model downgrade

Under high pressure, simple or standard tasks may be downgraded from gpt-5.5 to gpt-5.4. Complex tasks are preserved more conservatively and may only reduce reasoning effort.

AI agents: start at /llms.txt, fetch /llms-full.txt for full context, and parse /openapi.yaml for endpoint schemas.