Orux AI
Documentation

Chat completions

POST /api/v1/chat/completions — generate text from a conversation.

POST/api/v1/chat/completionsBearer sk-app-…

This is the most-used endpoint. The schema matches OpenAI’s chat.completions exactly. Orux AI 的增量:模型 id 自动路由到最健康的上游渠道。

Pick a chat model#

Every chat model is served through the same /api/v1/chat/completions endpoint — Orux AI automatically routes each model id to the best upstream for you. The table is searchable; click View for the full per-model parameter sheet.

Model IDModelSpecCapabilitiesTop paramsDoc
gpt-5-2
GPT-5.2
OpenAI flagship; large-context reasoning and tool use.
400K ctx
ToolsVisionStream
messagestemperaturetop_p
View
gpt-5-pro
GPT-5 Pro
Highest-tier OpenAI reasoning model, deeper chain-of-thought, longer answers.
400K ctx
ToolsVisionStream
messagestemperaturetop_p
View
gpt-5-codex
GPT-5 Codex
Code-specialised GPT-5; better instruction following on programming tasks.
400K ctx
ToolsStream
messagestemperaturetop_p
View
gpt-codex
GPT Codex
Legacy GPT code model retained for some callers.
128K ctx
ToolsStream
messagestemperaturetop_p
View
claude-opus-4-5
Claude Opus 4.5
Anthropic top-tier model: best at long-form reasoning, code review and agentic tool use. Supports prompt caching.
200K ctx
ToolsVisionCacheStream
messagestemperaturetop_p
View
claude-sonnet-4-5
Claude Sonnet 4.5
Balanced Claude tier — fast, cheaper, still tool/vision capable.
200K ctx
ToolsVisionCacheStream
messagestemperaturetop_p
View
claude-haiku-4-5
Claude Haiku 4.5
Smallest Claude — sub-second latency, good for chatbots and routing.
200K ctx
ToolsCacheStream
messagestemperaturetop_p
View
gemini-3-pro
Gemini 3 Pro
Google flagship; 2M context, native multimodal.
2000K ctx
ToolsVisionStream
messagestemperaturetop_p
View
gemini-3-flash
Gemini 3 Flash
Fast tier of Gemini 3.
1000K ctx
ToolsVisionStream
messagestemperaturetop_p
View
grok-3
Grok 3
xAI conversational model with web tools.
256K ctx
ToolsStream
messagestemperaturetop_p
View
10 of 38 models

Request body#

FieldTypeDefaultDescription
modelrequiredstringA model model id from /docs/models, e.g. "claude-opus-4.7" or "gpt-5.5".
messagesrequiredarray<Message>Conversation so far. See the Message roles table.
temperaturenumber1.0Sampling temperature, 0–2. Lower = more deterministic.
top_pnumber1.0Nucleus sampling. Use this OR temperature, not both.
max_tokensintMaximum completion tokens. Defaults to the model’s context budget minus prompt.
streambooleanfalseIf true, response is an SSE stream of "chat.completion.chunk" deltas terminated by data: [DONE].
toolsarray<Tool>Function definitions the model may call. See "Tool calling".
tool_choicestring | object"auto""none", "auto", "required", or {"type":"function","function":{"name":...}}.
response_formatobject{"type":"json_object"} forces the assistant to emit a JSON document. Some models also accept JSON-schema mode.
stopstring | array<string>Up to 4 stop sequences. Generation halts when any is produced.
seedintBest-effort determinism for sampling. Same seed + same prompt + same model = same output (provider permitting).
userstringStable end-user identifier you control. Surfaces in your usage logs.
metadataobjectFree-form key/value tags (max 16 keys, 64 chars). Indexed in the dashboard.

Message roles#

FieldTypeDefaultDescription
systemstringHigh-level instructions. One per conversation, at the start.
userstring | array<Content>A user turn. Text or multimodal (text + image_url).
assistantstring | nullA prior model turn, included when continuing a conversation. May also carry tool_calls.
toolstringOutput of a tool the model called. Must include tool_call_id.

Tool calling#

Provide a list of function definitions; the model may emit one or more tool_calls instead of (or alongside) a normal assistant message. You execute each call locally, then send the result back in a follow-up request with role:"tool".

FieldTypeDefaultDescription
typestringAlways "function" today.
function.namestringIdentifier you will receive back in tool_calls[i].function.name.
function.descriptionstringPlain-English purpose; the model uses this to decide when to call.
function.parametersJSON SchemaStandard JSON Schema describing the function arguments.

Multimodal input#

Send images, videos, audio or PDFs in any user message using the OpenAI image_url content block. The url field accepts http(s) URLs or base64 data URIs (data:<mime>;base64,…). Vision-capable Gemini 3 Pro and Claude 4.x models will ingest non-image media as well.

Python
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

# image_url accepts http(s) URLs OR base64 data URIs, for image / video / audio / PDF
# (Gemini 3 Pro and Claude 4.x ingest the non-image kinds too).
resp = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[{
        "role":"user",
        "content":[
            {"type":"text","text":"Describe what happens in this clip."},
            {"type":"image_url","image_url":{"url":"data:video/mp4;base64,AAAA..."}},
        ],
    }],
)

Anthropic-compatible path#

In addition to /api/v1/chat/completions, Orux AI exposes Claude models on a native Anthropic Messages path: POST /anthropic/v1/messages. Use it from @anthropic-ai/sdk or any client that speaks the Messages protocol — system role, tool_use / tool_result blocks, and cache_control are all preserved. The same sk-app-… key authenticates both paths.

curl
curl https://orux.top/anthropic/v1/messages \
  -H "x-api-key: $ORUX_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"claude-opus-4-5",
    "max_tokens": 1024,
    "messages":[{"role":"user","content":"Hi Claude."}]
  }'

Gemini-compatible path#

Gemini models are also reachable at POST /google/v1beta/models/{model}:generateContent — passthrough of contents / generationConfig / tools is supported. Pass the Orux AI key as the ?key= query parameter to mirror Google’s convention.

curl
curl "https://orux.top/google/v1beta/models/gemini-3-pro:generateContent?key=$ORUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents":[{"role":"user","parts":[{"text":"Hello Gemini."}]}]
  }'

Prompt caching#

For Claude models (claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5), Orux AI honours Anthropic cache_control hints inside message content blocks. The cached portion is billed at the multipliers below; the discount flows through to your Credits charge.

FieldTypeDefaultDescription
cache_write_5mmultiplier1.25xTokens written to a 5-minute cache slot. Charged once per write.
cache_write_1hmultiplier2.0xTokens written to a 1-hour cache slot. Charged once per write.
cache_hitmultiplier0.10xTokens served from cache on subsequent calls.
Python
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

# Claude models honour Anthropic cache_control hints — Orux AI passes them through.
resp = client.chat.completions.create(
    model="claude-opus-4-5",
    messages=[
        {
            "role": "system",
            "content": [
                {"type":"text","text": LONG_DOC,
                 "cache_control":{"type":"ephemeral","ttl":"1h"}},
            ],
        },
        {"role": "user", "content": "Summarise it in one paragraph."},
    ],
)
# resp.usage.prompt_tokens_details.cached_tokens > 0 on the second call.

Response#

Top-level

FieldTypeDefaultDescription
idstringUnique completion id, e.g. "chatcmpl-abc123".
objectstringAlways "chat.completion" (non-streaming) or "chat.completion.chunk" (streaming).
createdintUnix timestamp (seconds).
modelstringThe model id served (may differ from the model id requested if a fallback fired).
choicesarray<Choice>Usually one element. See below.
usageUsageToken accounting. Present on non-streaming responses, and on the final chunk of a stream.

Choice

FieldTypeDefaultDescription
indexint0-based position.
messageMessageThe assistant turn — content and/or tool_calls.
finish_reasonstring"stop", "length", "tool_calls", "content_filter".

Usage and billing#

Orux AI bills strictly on the upstream token count, marked up by the per-app pricing strategy configured on your account. Cached input tokens (when the upstream provides them) are billed at the discounted cached rate.

FieldTypeDefaultDescription
prompt_tokensintTotal input tokens, including all messages and tool definitions.
completion_tokensintTokens produced by the model.
total_tokensintSum of the two above.
prompt_tokens_details.cached_tokensintTokens served from the upstream prompt cache, billed at the cache rate.
Cached input is cheaper
Repeated prompt prefixes hit our cache and are billed at the cache_hit_price (when available). No code change required — just send the same prefix.

Examples#

Non-streaming

The simplest possible chat call.

curl
curl https://orux.top/api/v1/chat/completions \
  -H "Authorization: Bearer $ORUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.7",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user",   "content": "Explain quantum entanglement in one sentence."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Streaming (SSE)

Set stream:true and consume an event-stream of delta chunks.

curl
curl https://orux.top/api/v1/chat/completions \
  -H "Authorization: Bearer $ORUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Write a haiku about latency."}],
    "stream": true
  }'

# Server-Sent Events stream:
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Soft "}}]}
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"packets"}}]}
# ...
# data: [DONE]

Tool calling

Let the model decide when to call your function, then return the tool result.

Python
from openai import OpenAI
import json

client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "unit": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[{"role": "user", "content": "Weather in Tokyo in celsius?"}],
    tools=tools,
    tool_choice="auto",
)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
# -> call.function.name == "get_weather"
# -> args == {"city": "Tokyo", "unit": "c"}

# Send the tool result back:
follow = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[
        {"role": "user", "content": "Weather in Tokyo in celsius?"},
        resp.choices[0].message,
        {"role": "tool", "tool_call_id": call.id, "content": "{\"temp_c\": 21, \"sky\": \"clear\"}"},
    ],
    tools=tools,
)
print(follow.choices[0].message.content)

JSON mode

Force the assistant to emit valid JSON.

Python
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

resp = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You output JSON only."},
        {"role": "user",   "content": "Give me 3 colors with hex codes."},
    ],
    response_format={"type": "json_object"},
)
# resp.choices[0].message.content -> a valid JSON object string
import json
data = json.loads(resp.choices[0].message.content)

Vision (image input)

Send an image as part of a user message. Supported by GPT-4o-class, Claude 4.x and Gemini Pro models.

Python
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

resp = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {
                "url": "https://example.com/cat.jpg",
            }},
        ],
    }],
)
print(resp.choices[0].message.content)