Migration
Drop-in for OpenAI, Anthropic, and DeepSeek-style aggregators.
From OpenAI
One-line diff:
from openai import OpenAI
client = OpenAI(
- api_key=os.environ["OPENAI_API_KEY"],
+ api_key=os.environ["QGRE_API_KEY"],
+ base_url="https://api.qgre.com/v1",
)
stream = client.chat.completions.create(
- model="gpt-4o-mini",
+ model="qwen3-1.7b",
messages=[{"role": "user", "content": "..."}],
stream=True,
)Tool calls, function calling, and JSON mode are not exposed in v1. Plain chat-completion + streaming work identically.
From Anthropic
One-line diff:
from anthropic import Anthropic
client = Anthropic(
- api_key=os.environ["ANTHROPIC_API_KEY"],
+ api_key=os.environ["QGRE_API_KEY"],
+ base_url="https://api.qgre.com/anthropic",
)
with client.messages.stream(
- model="claude-haiku-4",
+ model="qwen3-1.7b",
max_tokens=512,
messages=[{"role": "user", "content": "..."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Vision, tool use, and structured outputs are not translated in v1. Plain text + streaming work via the SSE event mapping (message_start → content_block_delta → message_stop).
From DeepSeek / Qwen / Moonshot aggregators
If you're already using a Chinese-provider aggregator (AIPower, SiliconFlow, OpenRouter), you're already on the OpenAI-compatible path. Just change the base_url and api_key.
Cost-tier routing
QGRE adds a delegate-model-choice feature on top of the OpenAI surface. Set the model to auto and pass an X-QGRE-Cost-Tier header — the router picks the cheapest (or most-expensive) healthy priced upstream:
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "summary in 3 bullets"}],
extra_headers={"X-QGRE-Cost-Tier": "cheap"},
)Conversation pinning
Pass X-Conversation-Id: <your-stable-id> to pin a multi-turn session to the same upstream replica. KV-cache prefix reuse means each follow-up turn starts much faster — the backend's lcp_reused_tokens counter on /qgre/v1/stats shows the win.