Documentation

Rate limits

Per-user concurrency plus per-account token / Credits caps.

Orux AI keeps rate limiting simple: concurrency is enforced per user (not per key), and account-wide token / Credits caps gate long-term spend. Hitting any cap returns a 429 with a specific code so you know which knob to turn.

The limits#

Per-user concurrency

A single user is capped at a fixed number of in-flight requests across every key they own (default: 3). Streaming calls hold one slot until the stream closes. A 429 with concurrency_exceeded means you have too many open calls right now.

Monthly token quota

Account-level ceiling on input+output tokens per calendar month. A 429 with quota_exceeded means you have hit it; ask an admin to raise it.

Credits balance

Orux AI charges in Credits at request time. If the next charge would drive the balance below zero, the request is rejected with quota_exceeded — top up to resume.

Checking your limits#

Today the gateway only sets standard HTTP response headers (Content-Type, Server, etc.) — there is no x-ratelimit-* / x-credits-balance / retry-after exposure yet. To pace yourself, poll GET /api/user/v1/usage for current concurrency + monthly token spend + Credits balance. Per-call rate-limit headers are on the roadmap; we will document them here once shipped.

Exponential backoff with jitter#

For 429 and 5xx, retry up to ~5 times with delay = base × 2^attempt + random(0, base). base = 0.2s is a good starting point. Stop on auth_error and invalid_request_error — they will not improve with retry.

Recommended retry policy

Pythonpython

import time, random, requests

def call_with_retry(payload, max_retries=5):
    delay = 0.2
    for attempt in range(max_retries):
        r = requests.post(
            "https://orux.top/api/v1/chat/completions",
            headers={"Authorization": "Bearer $ORUX_API_KEY"},
            json=payload,
            timeout=60,
        )
        if r.status_code < 500 and r.status_code != 429:
            return r.json()
        # exponential backoff with jitter
        time.sleep(delay + random.uniform(0, delay))
        delay = min(delay * 2.5, 10)
    r.raise_for_status()

Back off on 429

When we 429 you, retry with exponential backoff + jitter (e.g. 0.5s, 1s, 2s, 4s) rather than tight retries. That gives the channel a chance to recover and avoids piling onto the same circuit breaker.

Streaming + concurrency#

A streaming call holds one user-level concurrency slot from the first byte until the stream closes. With the default cap of 3, plan for roughly 3 active streams per user at any moment — split work across users or ask an admin to raise the cap for an account.

Raising your limits#

Per-key concurrency / QPS are not exposed on this gateway — every key shares the user-level concurrency cap. To raise the per-user concurrency, monthly token quota, or stocked Credits, ask an admin in the dashboard. Per-key QPS is reserved for future use and not currently enforced.