Rate limits
Per-user concurrency plus per-account token / Credits caps.
Orux AI keeps rate limiting simple: concurrency is enforced per user (not per key), and account-wide token / Credits caps gate long-term spend. Hitting any cap returns a 429 with a specific code so you know which knob to turn.
The limits#
A single user is capped at a fixed number of in-flight requests across every key they own (default: 3). Streaming calls hold one slot until the stream closes. A 429 with concurrency_exceeded means you have too many open calls right now.
Account-level ceiling on input+output tokens per calendar month. A 429 with quota_exceeded means you have hit it; ask an admin to raise it.
Orux AI charges in Credits at request time. If the next charge would drive the balance below zero, the request is rejected with quota_exceeded — top up to resume.
Checking your limits#
Today the gateway only sets standard HTTP response headers (Content-Type, Server, etc.) — there is no x-ratelimit-* / x-credits-balance / retry-after exposure yet. To pace yourself, poll GET /api/user/v1/usage for current concurrency + monthly token spend + Credits balance. Per-call rate-limit headers are on the roadmap; we will document them here once shipped.
Exponential backoff with jitter#
For 429 and 5xx, retry up to ~5 times with delay = base × 2^attempt + random(0, base). base = 0.2s is a good starting point. Stop on auth_error and invalid_request_error — they will not improve with retry.
Recommended retry policy
import time, random, requests
def call_with_retry(payload, max_retries=5):
delay = 0.2
for attempt in range(max_retries):
r = requests.post(
"https://orux.top/api/v1/chat/completions",
headers={"Authorization": "Bearer $ORUX_API_KEY"},
json=payload,
timeout=60,
)
if r.status_code < 500 and r.status_code != 429:
return r.json()
# exponential backoff with jitter
time.sleep(delay + random.uniform(0, delay))
delay = min(delay * 2.5, 10)
r.raise_for_status()Streaming + concurrency#
A streaming call holds one user-level concurrency slot from the first byte until the stream closes. With the default cap of 3, plan for roughly 3 active streams per user at any moment — split work across users or ask an admin to raise the cap for an account.
Raising your limits#
Per-key concurrency / QPS are not exposed on this gateway — every key shares the user-level concurrency cap. To raise the per-user concurrency, monthly token quota, or stocked Credits, ask an admin in the dashboard. Per-key QPS is reserved for future use and not currently enforced.