# QuotaFlow full public documentation

Source: https://docs.quotaflow.ai

<!-- Source: introduction.mdx -->

# QuotaFlow Docs

QuotaFlow provides an OpenAI-compatible API gateway for Codex-style agents and model clients. Use it when you want one managed endpoint, API key control, usage tracking, quota protection, and upstream account failover.

## When to use this page

Use this page to understand the product surface before configuring an SDK or agent.

## Core endpoints

- API base URL: `https://api.quotaflow.ai`
- OpenAI-compatible Responses: `https://api.quotaflow.ai/openai/v1/responses`
- OpenAI-compatible Chat Completions: `https://api.quotaflow.ai/openai/v1/chat/completions`
- Model list: `https://api.quotaflow.ai/openai/v1/models`

## What QuotaFlow handles

- API key authentication
- OpenAI-compatible request shape
- Codex payload adaptation when enabled for your key
- Upstream account scheduling
- Sticky session routing for cache-friendly traffic
- Capacity guard, queueing, cooldown, and failover
- Usage and cost accounting

## What you need

1. A QuotaFlow API key.
2. The API base URL.
3. A compatible client, SDK, Codex configuration, or direct HTTP call.

## AI agent note

If you are an AI coding agent, start with `/llms.txt`, then fetch `/llms-full.txt` for full context, and use `/openapi.yaml` for endpoint schemas.

<!-- Source: quickstart.mdx -->

# Quickstart

## When to use this page

Use this page when you already have a QuotaFlow API key and want the fastest working request.

## 1. Set environment variables

```bash
export QUOTAFLOW_API_KEY="qf_your_key_here"
export OPENAI_BASE_URL="https://api.quotaflow.ai/openai/v1"
```

## 2. List models

```bash
curl https://api.quotaflow.ai/openai/v1/models \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY"
```

Expected result shape. Use the ids returned by your key; current prod verified ids are `gpt-5.2`, `gpt-5.3-codex`, `gpt-5.4`, `gpt-5.5`, and `gpt-4o-mini-transcribe`.

```json
{
  "object": "list",
  "data": [
    { "id": "gpt-5.2", "object": "model" },
    { "id": "gpt-5.3-codex", "object": "model" },
    { "id": "gpt-5.4", "object": "model" },
    { "id": "gpt-5.5", "object": "model" },
    { "id": "gpt-4o-mini-transcribe", "object": "model" }
  ]
}
```

## 3. Create a response

```bash
curl https://api.quotaflow.ai/openai/v1/responses \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Reply with only: QuotaFlow is connected.",
    "stream": false
  }'
```

## 4. Add a sticky session id

For agent workflows, include a stable session id so QuotaFlow can keep related calls on the same upstream account when possible.

```bash
curl https://api.quotaflow.ai/openai/v1/responses \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-session-id: customer-or-agent-session-123" \
  -d '{
    "model": "gpt-5.5",
    "input": "Continue this coding task.",
    "stream": true
  }'
```

## Production checklist

- Store the key in a secret manager or environment variable.
- Do not expose the key in browser code.
- Use a stable session id for long-running agents.
- Implement retry for `429`, `503`, and network timeouts.
- Monitor your usage in the QuotaFlow dashboard.

<!-- Source: authentication.mdx -->

# Authentication

## When to use this page

Use this page when wiring QuotaFlow into an SDK, server, CI job, or agent runtime.

## Bearer token

QuotaFlow accepts an API key as a bearer token.

```http
Authorization: Bearer qf_your_key_here
```

## Alternate headers

Some clients cannot set `Authorization`. QuotaFlow also accepts:

```http
x-api-key: qf_your_key_here
```

or:

```http
x-goog-api-key: qf_your_key_here
```

## Security rules

- Never commit API keys.
- Never place API keys in frontend JavaScript.
- Use different keys per customer, environment, or agent group.
- Rotate a key immediately if it appears in logs, screenshots, crash reports, or support tickets.

## Common authentication errors

```json
{
  "error": "Missing API key",
  "message": "Please provide an API key in the x-api-key, x-goog-api-key, or Authorization header"
}
```

This means no key reached the API.

```json
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error"
  }
}
```

This means the key is missing, disabled, expired, deleted, or from a different environment.

<!-- Source: get-an-api-key.mdx -->

# Get an API key

## When to use this page

Use this page when onboarding a new customer, workspace, or agent runtime.

## Customer flow

1. Ask your QuotaFlow admin for a production API key.
2. Confirm the target endpoint is `https://api.quotaflow.ai/openai/v1`.
3. Store the key as `QUOTAFLOW_API_KEY` or your platform's equivalent secret.
4. Run the model list smoke test.
5. Run one small non-streaming Responses request.
6. Switch your client or agent to the QuotaFlow base URL.

## Admin flow

1. Create or select a customer workspace.
2. Create a production API key for that workspace.
3. Assign the right package or pool.
4. Enable OpenAI permission for Codex/OpenAI-compatible usage.
5. Enable Codex payload adaptation when the key is intended for Codex-like clients.
6. Send the customer the key through a secure channel.

## Smoke test

```bash
curl https://api.quotaflow.ai/openai/v1/models \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY"
```

A `200` response confirms the key is recognized.

## Environment separation

Use production keys only with production endpoints. Old development keys are not guaranteed to work on production endpoints unless a migration was explicitly planned.

<!-- Source: openai-compatible/base-url.mdx -->

# Base URL

## When to use this page

Use this page when replacing direct OpenAI calls with QuotaFlow while keeping OpenAI-compatible SDK code.

## Base URL

```text
https://api.quotaflow.ai/openai/v1
```

## JavaScript SDK pattern

```ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QUOTAFLOW_API_KEY,
  baseURL: "https://api.quotaflow.ai/openai/v1"
});
```

## Python SDK pattern

```python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["QUOTAFLOW_API_KEY"],
    base_url="https://api.quotaflow.ai/openai/v1",
)
```

## Endpoint mapping

| OpenAI-compatible path | QuotaFlow URL |
| --- | --- |
| `/models` | `https://api.quotaflow.ai/openai/v1/models` |
| `/responses` | `https://api.quotaflow.ai/openai/v1/responses` |
| `/chat/completions` | `https://api.quotaflow.ai/openai/v1/chat/completions` |

## Sticky sessions

For coding agents and long conversations, pass one of these values consistently:

- `x-session-id` header
- `session_id` header
- `session_id` body field
- `conversation_id` body field
- `prompt_cache_key` body field

QuotaFlow uses this to prefer the same upstream account and improve cache locality.

<!-- Source: openai-compatible/responses.mdx -->

# Responses API

## When to use this page

Use this page for Codex-style agent calls, tool-capable workloads, and modern OpenAI-compatible clients.

## Endpoint

```http
POST https://api.quotaflow.ai/openai/v1/responses
```

## Minimal request

```json
{
  "model": "gpt-5.5",
  "input": "Write a concise deployment checklist.",
  "stream": false
}
```

## Streaming request

```json
{
  "model": "gpt-5.5",
  "input": "Explain this repository structure.",
  "stream": true
}
```

## Curl example

```bash
curl https://api.quotaflow.ai/openai/v1/responses \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-session-id: repo-agent-123" \
  -d '{
    "model": "gpt-5.5",
    "input": "Summarize the change in one paragraph.",
    "stream": false
  }'
```

## Codex adaptation

If enabled for your API key, QuotaFlow normalizes compatible requests for Codex-style upstream execution. You can usually keep the OpenAI-compatible request shape.

## Response shape

QuotaFlow returns OpenAI-compatible JSON for non-streaming calls and server-sent events for streaming calls.

<!-- Source: openai-compatible/chat-completions.mdx -->

# Chat Completions

## When to use this page

Use this page if your existing client still calls `/chat/completions`.

## Endpoint

```http
POST https://api.quotaflow.ai/openai/v1/chat/completions
```

## Request

```json
{
  "model": "gpt-5.5",
  "messages": [
    { "role": "user", "content": "Return only the word ready." }
  ],
  "stream": false
}
```

## Curl

```bash
curl https://api.quotaflow.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role":"user","content":"Return only the word ready."}],
    "stream": false
  }'
```

## Recommendation

Use `/responses` for new agent integrations. Keep `/chat/completions` for compatibility with older clients.

<!-- Source: openai-compatible/models.mdx -->

# Models

## When to use this page

Use this page to verify available model ids before configuring a client.

## Endpoint

```http
GET https://api.quotaflow.ai/openai/v1/models
```

## Curl

```bash
curl https://api.quotaflow.ai/openai/v1/models \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY"
```

## Model ids

The `/models` endpoint is the source of truth. QuotaFlow only lists model ids that are exposed for the authenticated key. Do not hard-code models that are not returned by `/models`.

Current prod models verified with 200 responses:

- `gpt-5.2` — Responses and Chat Completions
- `gpt-5.3-codex` — Responses and Chat Completions
- `gpt-5.4` — Responses and Chat Completions
- `gpt-5.5` — Responses and Chat Completions
- `gpt-4o-mini-transcribe` — Audio transcription only

## Downgrade behavior

During high pressure, QuotaFlow may route simple or standard work from a higher model to a lower compatible model if the key and route policy allow it. Complex work is preserved more conservatively.

<!-- Source: openai-compatible/audio-transcriptions.mdx -->

# Audio transcriptions

## When to use this page

Use this page when you need speech-to-text through the OpenAI-compatible QuotaFlow endpoint.

## Endpoint

```http
POST https://api.quotaflow.ai/openai/v1/audio/transcriptions
```

## Supported model

Current prod audio transcription support is verified for:

- `gpt-4o-mini-transcribe`

Do not assume speech generation or audio translation support from this page. Those are separate account capabilities and are not public supported capabilities until they pass prod 200 smoke tests.

## Curl

```bash
curl https://api.quotaflow.ai/openai/v1/audio/transcriptions \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -F "model=gpt-4o-mini-transcribe" \
  -F "file=@./sample.wav"
```

## Routing behavior

QuotaFlow routes this endpoint only to upstream accounts marked with the `audioTranscription` capability. If no capable account is available, the request fails instead of randomly hitting a Codex-only account.

<!-- Source: codex/overview.mdx -->

# Codex overview

## When to use this page

Use this page when connecting a coding agent, local CLI, desktop agent, or CI agent to QuotaFlow.

## Recommended settings

- Base URL: `https://api.quotaflow.ai/openai/v1`
- API key environment variable: `QUOTAFLOW_API_KEY`
- Preferred endpoint: `/responses`
- Preferred session signal: `x-session-id` or `prompt_cache_key`
- Streaming: enabled for interactive agents

## Why session ids matter

QuotaFlow tries to keep related agent calls on the same upstream account. This helps with cache locality and avoids unnecessary account switching. If an account is rate limited or unhealthy, QuotaFlow can move the session to another account.

## Capacity behavior

QuotaFlow uses dynamic account pool capacity. It does not rely on a single fixed total bucket. If accounts cool down, hit usage limits, or become unavailable, capacity estimates decrease. When accounts recover, capacity returns.

<!-- Source: codex/cli.mdx -->

# Codex CLI

## When to use this page

Use this page for a local command-line coding agent that supports an OpenAI-compatible base URL.

## Environment variables

```bash
export QUOTAFLOW_API_KEY="qf_your_key_here"
export OPENAI_API_KEY="$QUOTAFLOW_API_KEY"
export OPENAI_BASE_URL="https://api.quotaflow.ai/openai/v1"
```

Some clients use `OPENAI_BASE_URL`; others use `OPENAI_API_BASE`, `OPENAI_HOST`, or a config file. Use the variable your client supports.

## Smoke test

```bash
curl "$OPENAI_BASE_URL/models" \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

## Agent session guidance

If your client supports custom headers, set a stable `x-session-id` per workspace or task. If it supports `prompt_cache_key`, use a stable value per repository or long-running task.

## Failure handling

- Retry transient `429` with backoff.
- Retry `503` and network timeouts.
- Do not retry invalid API key errors without rotating the key.

<!-- Source: codex/desktop.mdx -->

# Desktop agents

## When to use this page

Use this page for desktop agents, IDE extensions, or local GUI coding tools that support OpenAI-compatible endpoints.

## Configuration values

| Setting | Value |
| --- | --- |
| API key | Your QuotaFlow production API key |
| Base URL | `https://api.quotaflow.ai/openai/v1` |
| Responses URL | `https://api.quotaflow.ai/openai/v1/responses` |
| Models URL | `https://api.quotaflow.ai/openai/v1/models` |

## Setup steps

1. Open your agent settings.
2. Choose OpenAI-compatible provider if available.
3. Paste your QuotaFlow API key.
4. Set the base URL to `https://api.quotaflow.ai/openai/v1`.
5. Save and run a small test prompt.

## Recommended test prompt

```text
Reply with only: connected
```

## Notes

Do not paste production API keys into screenshots, issue trackers, or shared demo recordings.

<!-- Source: codex/capacity-queue-and-failover.mdx -->

# Capacity, queue, and failover

## When to use this page

Use this page to understand throttling, sticky sessions, failover, and model downgrade behavior.

## Dynamic capacity

QuotaFlow estimates usable capacity from currently healthy upstream accounts. It excludes accounts that are inactive, unauthorized, disabled, temporarily unavailable, or in active cooldown.

## Sticky routing

QuotaFlow prefers the same upstream account for a stable session id. Sticky routing can be bypassed when the selected account is under pressure or unavailable.

## Account failover

If an upstream account returns a transient `429`, QuotaFlow cools that account briefly and retries the request with another account when available.

If the upstream error is `usage_limit_reached` and includes a reset time, QuotaFlow cools the account until the reset window expires.

## Queueing

When configured capacity is exhausted, QuotaFlow may queue requests for a bounded time instead of immediately failing. Queue behavior depends on the API key and pool configuration.

## Model downgrade

Under high pressure, simple or standard tasks may be downgraded from `gpt-5.5` to `gpt-5.4`. Complex tasks are preserved more conservatively and may only reduce reasoning effort.

<!-- Source: examples/curl.mdx -->

# Curl examples

## When to use this page

Use this page to test a key or debug connectivity without an SDK.

## Set a key

```bash
export QUOTAFLOW_API_KEY="qf_your_key_here"
```

## List models

```bash
curl https://api.quotaflow.ai/openai/v1/models \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY"
```

## Non-streaming response

```bash
curl https://api.quotaflow.ai/openai/v1/responses \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-session-id: curl-test" \
  -d '{
    "model": "gpt-5.5",
    "input": "Return a JSON object with status connected.",
    "stream": false
  }'
```

## Streaming response

```bash
curl -N https://api.quotaflow.ai/openai/v1/responses \
  -H "Authorization: Bearer $QUOTAFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-session-id: curl-stream-test" \
  -d '{
    "model": "gpt-5.5",
    "input": "Count from one to five slowly.",
    "stream": true
  }'
```

<!-- Source: examples/node.mdx -->

# Node.js examples

## When to use this page

Use this page for server-side JavaScript or TypeScript integrations.

## Install

```bash
npm install openai
```

## Create a response

```ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QUOTAFLOW_API_KEY,
  baseURL: "https://api.quotaflow.ai/openai/v1"
});

const response = await client.responses.create({
  model: "gpt-5.5",
  input: "Return only: connected",
  stream: false
});

console.log(response);
```

## Chat completions compatibility

```ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QUOTAFLOW_API_KEY,
  baseURL: "https://api.quotaflow.ai/openai/v1"
});

const completion = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [{ role: "user", content: "Return only: ready" }]
});

console.log(completion.choices[0]?.message?.content);
```

<!-- Source: examples/python.mdx -->

# Python examples

## When to use this page

Use this page for Python services, scripts, notebooks, or agents.

## Install

```bash
pip install openai
```

## Create a response

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["QUOTAFLOW_API_KEY"],
    base_url="https://api.quotaflow.ai/openai/v1",
)

response = client.responses.create(
    model="gpt-5.5",
    input="Return only: connected",
    stream=False,
)

print(response)
```

## Chat completions compatibility

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["QUOTAFLOW_API_KEY"],
    base_url="https://api.quotaflow.ai/openai/v1",
)

completion = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Return only: ready"}],
)

print(completion.choices[0].message.content)
```

<!-- Source: reference/errors.mdx -->

# Errors

## When to use this page

Use this page when a request fails and you need to decide whether to retry, rotate a key, or contact support.

## Missing API key

```json
{
  "error": "Missing API key",
  "message": "Please provide an API key in the x-api-key, x-goog-api-key, or Authorization header"
}
```

Fix: send `Authorization: Bearer $QUOTAFLOW_API_KEY`.

## Invalid API key

```json
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error"
  }
}
```

Fix: verify environment, key value, and whether the key is active.

## Permission denied

```json
{
  "error": {
    "message": "This API key does not have permission to access OpenAI",
    "type": "permission_denied",
    "code": "permission_denied"
  }
}
```

Fix: ask an admin to enable OpenAI permission for the key.

## Capacity queue full

```json
{
  "error": {
    "message": "Capacity queue full",
    "code": "capacity_queue_full"
  }
}
```

Fix: retry with backoff or ask for a higher package/pool.

## Upstream rate limit

```json
{
  "error": {
    "type": "usage_limit_reached",
    "message": "The usage limit has been reached",
    "resets_in_seconds": 3600
  }
}
```

Fix: retry after the reset time. QuotaFlow also cools the affected upstream account and uses alternate accounts when available.

<!-- Source: reference/rate-limits.mdx -->

# Rate limits

## When to use this page

Use this page to understand capacity protection and customer package behavior.

## Limit layers

QuotaFlow can enforce limits at multiple layers:

- Global Codex capacity
- Pool capacity
- API key capacity
- Upstream account pressure
- Upstream provider usage limits

## Dynamic pool capacity

QuotaFlow estimates capacity from accounts that are currently usable. Accounts in cooldown or usage-limit reset windows do not count toward active capacity until they recover.

## Queue behavior

When capacity is exhausted, eligible requests may wait in a bounded queue. If the queue is full or the wait exceeds the configured safe window, the request fails with a capacity error.

## Retry guidance

- Retry `429` with exponential backoff and jitter.
- Retry `503` and timeouts.
- Do not retry authentication errors until the key is fixed.
- Keep session ids stable across retries.

<!-- Source: troubleshooting/common-issues.mdx -->

# Common issues

## When to use this page

Use this page when connecting a new client or agent fails.

## `401 Missing API key`

Your request did not include a key in a supported header. Add:

```http
Authorization: Bearer qf_your_key_here
```

## `403 permission_denied`

The key exists but does not have OpenAI permission. Ask an admin to update the key.

## Models endpoint works but responses fail

Check:

- The request body is valid JSON.
- The model id is supported by your package.
- The key has OpenAI permission.
- The endpoint is `/openai/v1/responses`, not `/v1/responses`.

## Agent loses cache benefits

Set a stable `x-session-id`, `session_id`, `conversation_id`, or `prompt_cache_key`.

## Old development key fails on production

Use a production-issued key with `https://api.quotaflow.ai`. Development keys are not automatically valid in production.

<!-- Source: troubleshooting/agent-notes.mdx -->

# Notes for AI agents

## When to use this page

Use this page if you are an AI assistant configuring QuotaFlow for a user.

## Required values

- Base URL: `https://api.quotaflow.ai/openai/v1`
- Auth header: `Authorization: Bearer <QUOTAFLOW_API_KEY>`
- Preferred endpoint: `POST /responses`
- Model discovery: `GET /models`

## Safe setup algorithm

1. Ask the user for the environment variable name where their key is stored.
2. Do not ask the user to paste a production key into chat unless unavoidable.
3. Configure the OpenAI-compatible base URL.
4. Run `GET /models` to validate the key.
5. Run a one-line non-streaming `POST /responses` smoke test.
6. Add a stable session id for long-running coding tasks.
7. If `429` appears, retry with backoff and preserve the same session id.

## Do not do this

- Do not commit keys.
- Do not print keys in logs.
- Do not expose keys to frontend code.
- Do not assume development keys work in production.