Chat completions
POST /api/v1/chat/completions — generate text from a conversation.
/api/v1/chat/completionsBearer sk-app-…This is the most-used endpoint. The schema matches OpenAI’s chat.completions exactly. Orux AI 的增量:模型 id 自动路由到最健康的上游渠道。
Pick a chat model#
Every chat model is served through the same /api/v1/chat/completions endpoint — Orux AI automatically routes each model id to the best upstream for you. The table is searchable; click View for the full per-model parameter sheet.
| Model ID | Model | Spec | Capabilities | Top params | Doc |
|---|---|---|---|---|---|
gpt-5-2 | GPT-5.2 OpenAI flagship; large-context reasoning and tool use. | 400K ctx | ToolsVisionStream | messagestemperaturetop_p | View → |
gpt-5-pro | GPT-5 Pro Highest-tier OpenAI reasoning model, deeper chain-of-thought, longer answers. | 400K ctx | ToolsVisionStream | messagestemperaturetop_p | View → |
gpt-5-codex | GPT-5 Codex Code-specialised GPT-5; better instruction following on programming tasks. | 400K ctx | ToolsStream | messagestemperaturetop_p | View → |
gpt-codex | GPT Codex Legacy GPT code model retained for some callers. | 128K ctx | ToolsStream | messagestemperaturetop_p | View → |
claude-opus-4-5 | Claude Opus 4.5 Anthropic top-tier model: best at long-form reasoning, code review and agentic tool use. Supports prompt caching. | 200K ctx | ToolsVisionCacheStream | messagestemperaturetop_p | View → |
claude-sonnet-4-5 | Claude Sonnet 4.5 Balanced Claude tier — fast, cheaper, still tool/vision capable. | 200K ctx | ToolsVisionCacheStream | messagestemperaturetop_p | View → |
claude-haiku-4-5 | Claude Haiku 4.5 Smallest Claude — sub-second latency, good for chatbots and routing. | 200K ctx | ToolsCacheStream | messagestemperaturetop_p | View → |
gemini-3-pro | Gemini 3 Pro Google flagship; 2M context, native multimodal. | 2000K ctx | ToolsVisionStream | messagestemperaturetop_p | View → |
gemini-3-flash | Gemini 3 Flash Fast tier of Gemini 3. | 1000K ctx | ToolsVisionStream | messagestemperaturetop_p | View → |
grok-3 | Grok 3 xAI conversational model with web tools. | 256K ctx | ToolsStream | messagestemperaturetop_p | View → |
Request body#
| Field | Type | Default | Description |
|---|---|---|---|
modelrequired | string | — | A model model id from /docs/models, e.g. "claude-opus-4.7" or "gpt-5.5". |
messagesrequired | array<Message> | — | Conversation so far. See the Message roles table. |
temperature | number | 1.0 | Sampling temperature, 0–2. Lower = more deterministic. |
top_p | number | 1.0 | Nucleus sampling. Use this OR temperature, not both. |
max_tokens | int | — | Maximum completion tokens. Defaults to the model’s context budget minus prompt. |
stream | boolean | false | If true, response is an SSE stream of "chat.completion.chunk" deltas terminated by data: [DONE]. |
tools | array<Tool> | — | Function definitions the model may call. See "Tool calling". |
tool_choice | string | object | "auto" | "none", "auto", "required", or {"type":"function","function":{"name":...}}. |
response_format | object | — | {"type":"json_object"} forces the assistant to emit a JSON document. Some models also accept JSON-schema mode. |
stop | string | array<string> | — | Up to 4 stop sequences. Generation halts when any is produced. |
seed | int | — | Best-effort determinism for sampling. Same seed + same prompt + same model = same output (provider permitting). |
user | string | — | Stable end-user identifier you control. Surfaces in your usage logs. |
metadata | object | — | Free-form key/value tags (max 16 keys, 64 chars). Indexed in the dashboard. |
Message roles#
| Field | Type | Default | Description |
|---|---|---|---|
system | string | — | High-level instructions. One per conversation, at the start. |
user | string | array<Content> | — | A user turn. Text or multimodal (text + image_url). |
assistant | string | null | — | A prior model turn, included when continuing a conversation. May also carry tool_calls. |
tool | string | — | Output of a tool the model called. Must include tool_call_id. |
Tool calling#
Provide a list of function definitions; the model may emit one or more tool_calls instead of (or alongside) a normal assistant message. You execute each call locally, then send the result back in a follow-up request with role:"tool".
| Field | Type | Default | Description |
|---|---|---|---|
type | string | — | Always "function" today. |
function.name | string | — | Identifier you will receive back in tool_calls[i].function.name. |
function.description | string | — | Plain-English purpose; the model uses this to decide when to call. |
function.parameters | JSON Schema | — | Standard JSON Schema describing the function arguments. |
Multimodal input#
Send images, videos, audio or PDFs in any user message using the OpenAI image_url content block. The url field accepts http(s) URLs or base64 data URIs (data:<mime>;base64,…). Vision-capable Gemini 3 Pro and Claude 4.x models will ingest non-image media as well.
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")
# image_url accepts http(s) URLs OR base64 data URIs, for image / video / audio / PDF
# (Gemini 3 Pro and Claude 4.x ingest the non-image kinds too).
resp = client.chat.completions.create(
model="gemini-3-pro",
messages=[{
"role":"user",
"content":[
{"type":"text","text":"Describe what happens in this clip."},
{"type":"image_url","image_url":{"url":"data:video/mp4;base64,AAAA..."}},
],
}],
)Anthropic-compatible path#
In addition to /api/v1/chat/completions, Orux AI exposes Claude models on a native Anthropic Messages path: POST /anthropic/v1/messages. Use it from @anthropic-ai/sdk or any client that speaks the Messages protocol — system role, tool_use / tool_result blocks, and cache_control are all preserved. The same sk-app-… key authenticates both paths.
curl https://orux.top/anthropic/v1/messages \
-H "x-api-key: $ORUX_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model":"claude-opus-4-5",
"max_tokens": 1024,
"messages":[{"role":"user","content":"Hi Claude."}]
}'Gemini-compatible path#
Gemini models are also reachable at POST /google/v1beta/models/{model}:generateContent — passthrough of contents / generationConfig / tools is supported. Pass the Orux AI key as the ?key= query parameter to mirror Google’s convention.
curl "https://orux.top/google/v1beta/models/gemini-3-pro:generateContent?key=$ORUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents":[{"role":"user","parts":[{"text":"Hello Gemini."}]}]
}'Prompt caching#
For Claude models (claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5), Orux AI honours Anthropic cache_control hints inside message content blocks. The cached portion is billed at the multipliers below; the discount flows through to your Credits charge.
| Field | Type | Default | Description |
|---|---|---|---|
cache_write_5m | multiplier | 1.25x | Tokens written to a 5-minute cache slot. Charged once per write. |
cache_write_1h | multiplier | 2.0x | Tokens written to a 1-hour cache slot. Charged once per write. |
cache_hit | multiplier | 0.10x | Tokens served from cache on subsequent calls. |
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")
# Claude models honour Anthropic cache_control hints — Orux AI passes them through.
resp = client.chat.completions.create(
model="claude-opus-4-5",
messages=[
{
"role": "system",
"content": [
{"type":"text","text": LONG_DOC,
"cache_control":{"type":"ephemeral","ttl":"1h"}},
],
},
{"role": "user", "content": "Summarise it in one paragraph."},
],
)
# resp.usage.prompt_tokens_details.cached_tokens > 0 on the second call.Response#
Top-level
| Field | Type | Default | Description |
|---|---|---|---|
id | string | — | Unique completion id, e.g. "chatcmpl-abc123". |
object | string | — | Always "chat.completion" (non-streaming) or "chat.completion.chunk" (streaming). |
created | int | — | Unix timestamp (seconds). |
model | string | — | The model id served (may differ from the model id requested if a fallback fired). |
choices | array<Choice> | — | Usually one element. See below. |
usage | Usage | — | Token accounting. Present on non-streaming responses, and on the final chunk of a stream. |
Choice
| Field | Type | Default | Description |
|---|---|---|---|
index | int | — | 0-based position. |
message | Message | — | The assistant turn — content and/or tool_calls. |
finish_reason | string | — | "stop", "length", "tool_calls", "content_filter". |
Usage and billing#
Orux AI bills strictly on the upstream token count, marked up by the per-app pricing strategy configured on your account. Cached input tokens (when the upstream provides them) are billed at the discounted cached rate.
| Field | Type | Default | Description |
|---|---|---|---|
prompt_tokens | int | — | Total input tokens, including all messages and tool definitions. |
completion_tokens | int | — | Tokens produced by the model. |
total_tokens | int | — | Sum of the two above. |
prompt_tokens_details.cached_tokens | int | — | Tokens served from the upstream prompt cache, billed at the cache rate. |
Examples#
Non-streaming
The simplest possible chat call.
curl https://orux.top/api/v1/chat/completions \
-H "Authorization: Bearer $ORUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4.7",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain quantum entanglement in one sentence."}
],
"temperature": 0.7,
"max_tokens": 256
}'Streaming (SSE)
Set stream:true and consume an event-stream of delta chunks.
curl https://orux.top/api/v1/chat/completions \
-H "Authorization: Bearer $ORUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Write a haiku about latency."}],
"stream": true
}'
# Server-Sent Events stream:
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Soft "}}]}
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"packets"}}]}
# ...
# data: [DONE]Tool calling
Let the model decide when to call your function, then return the tool result.
from openai import OpenAI
import json
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["c", "f"]},
},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="claude-opus-4.7",
messages=[{"role": "user", "content": "Weather in Tokyo in celsius?"}],
tools=tools,
tool_choice="auto",
)
call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
# -> call.function.name == "get_weather"
# -> args == {"city": "Tokyo", "unit": "c"}
# Send the tool result back:
follow = client.chat.completions.create(
model="claude-opus-4.7",
messages=[
{"role": "user", "content": "Weather in Tokyo in celsius?"},
resp.choices[0].message,
{"role": "tool", "tool_call_id": call.id, "content": "{\"temp_c\": 21, \"sky\": \"clear\"}"},
],
tools=tools,
)
print(follow.choices[0].message.content)JSON mode
Force the assistant to emit valid JSON.
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")
resp = client.chat.completions.create(
model="gpt-5.5",
messages=[
{"role": "system", "content": "You output JSON only."},
{"role": "user", "content": "Give me 3 colors with hex codes."},
],
response_format={"type": "json_object"},
)
# resp.choices[0].message.content -> a valid JSON object string
import json
data = json.loads(resp.choices[0].message.content)Vision (image input)
Send an image as part of a user message. Supported by GPT-4o-class, Claude 4.x and Gemini Pro models.
from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")
resp = client.chat.completions.create(
model="claude-opus-4.7",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {
"url": "https://example.com/cat.jpg",
}},
],
}],
)
print(resp.choices[0].message.content)