Documentation

Chat completions

POST /api/v1/chat/completions — generate text from a conversation.

POST/api/v1/chat/completionsBearer sk-app-…

This is the most-used endpoint. The schema matches OpenAI’s chat.completions exactly. Orux AI 的增量：模型 id 自动路由到最健康的上游渠道。

Pick a chat model#

Every chat model is served through the same /api/v1/chat/completions endpoint — Orux AI automatically routes each model id to the best upstream for you. The table is searchable; click View for the full per-model parameter sheet.

Model ID	Model	Spec	Capabilities	Top params	Doc
`gpt-5-2`	GPT-5.2 OpenAI flagship; large-context reasoning and tool use.	400K ctx	ToolsVisionStream	`messagestemperaturetop_p`	View →
`gpt-5-pro`	GPT-5 Pro Highest-tier OpenAI reasoning model, deeper chain-of-thought, longer answers.	400K ctx	ToolsVisionStream	`messagestemperaturetop_p`	View →
`gpt-5-codex`	GPT-5 Codex Code-specialised GPT-5; better instruction following on programming tasks.	400K ctx	ToolsStream	`messagestemperaturetop_p`	View →
`gpt-codex`	GPT Codex Legacy GPT code model retained for some callers.	128K ctx	ToolsStream	`messagestemperaturetop_p`	View →
`claude-opus-4-5`	Claude Opus 4.5 Anthropic top-tier model: best at long-form reasoning, code review and agentic tool use. Supports prompt caching.	200K ctx	ToolsVisionCacheStream	`messagestemperaturetop_p`	View →
`claude-sonnet-4-5`	Claude Sonnet 4.5 Balanced Claude tier — fast, cheaper, still tool/vision capable.	200K ctx	ToolsVisionCacheStream	`messagestemperaturetop_p`	View →
`claude-haiku-4-5`	Claude Haiku 4.5 Smallest Claude — sub-second latency, good for chatbots and routing.	200K ctx	ToolsCacheStream	`messagestemperaturetop_p`	View →
`gemini-3-pro`	Gemini 3 Pro Google flagship; 2M context, native multimodal.	2000K ctx	ToolsVisionStream	`messagestemperaturetop_p`	View →
`gemini-3-flash`	Gemini 3 Flash Fast tier of Gemini 3.	1000K ctx	ToolsVisionStream	`messagestemperaturetop_p`	View →
`grok-3`	Grok 3 xAI conversational model with web tools.	256K ctx	ToolsStream	`messagestemperaturetop_p`	View →

10 of 38 models

Request body#

Field	Type	Default	Description
`model`required	`string`	—	A model model id from /docs/models, e.g. "claude-opus-4.7" or "gpt-5.5".
`messages`required	`array<Message>`	—	Conversation so far. See the Message roles table.
`temperature`	`number`	`1.0`	Sampling temperature, 0–2. Lower = more deterministic.
`top_p`	`number`	`1.0`	Nucleus sampling. Use this OR temperature, not both.
`max_tokens`	`int`	—	Maximum completion tokens. Defaults to the model’s context budget minus prompt.
`stream`	`boolean`	`false`	If true, response is an SSE stream of "chat.completion.chunk" deltas terminated by data: [DONE].
`tools`	`array<Tool>`	—	Function definitions the model may call. See "Tool calling".
`tool_choice`	`string \| object`	`"auto"`	"none", "auto", "required", or {"type":"function","function":{"name":...}}.
`response_format`	`object`	—	{"type":"json_object"} forces the assistant to emit a JSON document. Some models also accept JSON-schema mode.
`stop`	`string \| array<string>`	—	Up to 4 stop sequences. Generation halts when any is produced.
`seed`	`int`	—	Best-effort determinism for sampling. Same seed + same prompt + same model = same output (provider permitting).
`user`	`string`	—	Stable end-user identifier you control. Surfaces in your usage logs.
`metadata`	`object`	—	Free-form key/value tags (max 16 keys, 64 chars). Indexed in the dashboard.

Message roles#

Field	Type	Default	Description
`system`	`string`	—	High-level instructions. One per conversation, at the start.
`user`	`string \| array<Content>`	—	A user turn. Text or multimodal (text + image_url).
`assistant`	`string \| null`	—	A prior model turn, included when continuing a conversation. May also carry tool_calls.
`tool`	`string`	—	Output of a tool the model called. Must include tool_call_id.

Tool calling#

Provide a list of function definitions; the model may emit one or more tool_calls instead of (or alongside) a normal assistant message. You execute each call locally, then send the result back in a follow-up request with role:"tool".

Field	Type	Default	Description
`type`	`string`	—	Always "function" today.
`function.name`	`string`	—	Identifier you will receive back in tool_calls[i].function.name.
`function.description`	`string`	—	Plain-English purpose; the model uses this to decide when to call.
`function.parameters`	`JSON Schema`	—	Standard JSON Schema describing the function arguments.

Multimodal input#

Send images, videos, audio or PDFs in any user message using the OpenAI image_url content block. The url field accepts http(s) URLs or base64 data URIs (data:<mime>;base64,…). Vision-capable Gemini 3 Pro and Claude 4.x models will ingest non-image media as well.

Pythonpython

from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

# image_url accepts http(s) URLs OR base64 data URIs, for image / video / audio / PDF
# (Gemini 3 Pro and Claude 4.x ingest the non-image kinds too).
resp = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[{
        "role":"user",
        "content":[
            {"type":"text","text":"Describe what happens in this clip."},
            {"type":"image_url","image_url":{"url":"data:video/mp4;base64,AAAA..."}},
        ],
    }],
)

Anthropic-compatible path#

In addition to /api/v1/chat/completions, Orux AI exposes Claude models on a native Anthropic Messages path: POST /anthropic/v1/messages. Use it from @anthropic-ai/sdk or any client that speaks the Messages protocol — system role, tool_use / tool_result blocks, and cache_control are all preserved. The same sk-app-… key authenticates both paths.

curlshell

curl https://orux.top/anthropic/v1/messages \
  -H "x-api-key: $ORUX_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"claude-opus-4-5",
    "max_tokens": 1024,
    "messages":[{"role":"user","content":"Hi Claude."}]
  }'

Gemini-compatible path#

Gemini models are also reachable at POST /google/v1beta/models/{model}:generateContent — passthrough of contents / generationConfig / tools is supported. Pass the Orux AI key as the ?key= query parameter to mirror Google’s convention.

curlshell

curl "https://orux.top/google/v1beta/models/gemini-3-pro:generateContent?key=$ORUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents":[{"role":"user","parts":[{"text":"Hello Gemini."}]}]
  }'

Prompt caching#

For Claude models (claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5), Orux AI honours Anthropic cache_control hints inside message content blocks. The cached portion is billed at the multipliers below; the discount flows through to your Credits charge.

Field	Type	Default	Description
`cache_write_5m`	`multiplier`	`1.25x`	Tokens written to a 5-minute cache slot. Charged once per write.
`cache_write_1h`	`multiplier`	`2.0x`	Tokens written to a 1-hour cache slot. Charged once per write.
`cache_hit`	`multiplier`	`0.10x`	Tokens served from cache on subsequent calls.

Pythonpython

from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

# Claude models honour Anthropic cache_control hints — Orux AI passes them through.
resp = client.chat.completions.create(
    model="claude-opus-4-5",
    messages=[
        {
            "role": "system",
            "content": [
                {"type":"text","text": LONG_DOC,
                 "cache_control":{"type":"ephemeral","ttl":"1h"}},
            ],
        },
        {"role": "user", "content": "Summarise it in one paragraph."},
    ],
)
# resp.usage.prompt_tokens_details.cached_tokens > 0 on the second call.

Response#

Top-level

Field	Type	Default	Description
`id`	`string`	—	Unique completion id, e.g. "chatcmpl-abc123".
`object`	`string`	—	Always "chat.completion" (non-streaming) or "chat.completion.chunk" (streaming).
`created`	`int`	—	Unix timestamp (seconds).
`model`	`string`	—	The model id served (may differ from the model id requested if a fallback fired).
`choices`	`array<Choice>`	—	Usually one element. See below.
`usage`	`Usage`	—	Token accounting. Present on non-streaming responses, and on the final chunk of a stream.

Choice

Field	Type	Default	Description
`index`	`int`	—	0-based position.
`message`	`Message`	—	The assistant turn — content and/or tool_calls.
`finish_reason`	`string`	—	"stop", "length", "tool_calls", "content_filter".

Usage and billing#

Orux AI bills strictly on the upstream token count, marked up by the per-app pricing strategy configured on your account. Cached input tokens (when the upstream provides them) are billed at the discounted cached rate.

Field	Type	Default	Description
`prompt_tokens`	`int`	—	Total input tokens, including all messages and tool definitions.
`completion_tokens`	`int`	—	Tokens produced by the model.
`total_tokens`	`int`	—	Sum of the two above.
`prompt_tokens_details.cached_tokens`	`int`	—	Tokens served from the upstream prompt cache, billed at the cache rate.

Cached input is cheaper

Repeated prompt prefixes hit our cache and are billed at the cache_hit_price (when available). No code change required — just send the same prefix.

Examples#

Non-streaming

The simplest possible chat call.

curlshell

curl https://orux.top/api/v1/chat/completions \
  -H "Authorization: Bearer $ORUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.7",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user",   "content": "Explain quantum entanglement in one sentence."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Streaming (SSE)

Set stream:true and consume an event-stream of delta chunks.

curlshell

curl https://orux.top/api/v1/chat/completions \
  -H "Authorization: Bearer $ORUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Write a haiku about latency."}],
    "stream": true
  }'

# Server-Sent Events stream:
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"}}]}
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Soft "}}]}
# data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"packets"}}]}
# ...
# data: [DONE]

Tool calling

Let the model decide when to call your function, then return the tool result.

Pythonpython

from openai import OpenAI
import json

client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "unit": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[{"role": "user", "content": "Weather in Tokyo in celsius?"}],
    tools=tools,
    tool_choice="auto",
)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
# -> call.function.name == "get_weather"
# -> args == {"city": "Tokyo", "unit": "c"}

# Send the tool result back:
follow = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[
        {"role": "user", "content": "Weather in Tokyo in celsius?"},
        resp.choices[0].message,
        {"role": "tool", "tool_call_id": call.id, "content": "{\"temp_c\": 21, \"sky\": \"clear\"}"},
    ],
    tools=tools,
)
print(follow.choices[0].message.content)

JSON mode

Force the assistant to emit valid JSON.

Pythonpython

from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

resp = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You output JSON only."},
        {"role": "user",   "content": "Give me 3 colors with hex codes."},
    ],
    response_format={"type": "json_object"},
)
# resp.choices[0].message.content -> a valid JSON object string
import json
data = json.loads(resp.choices[0].message.content)

Vision (image input)

Send an image as part of a user message. Supported by GPT-4o-class, Claude 4.x and Gemini Pro models.

Pythonpython

from openai import OpenAI
client = OpenAI(api_key="sk-app-...", base_url="https://orux.top/api/v1")

resp = client.chat.completions.create(
    model="claude-opus-4.7",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {
                "url": "https://example.com/cat.jpg",
            }},
        ],
    }],
)
print(resp.choices[0].message.content)