Migration

Drop-in for OpenAI, Anthropic, and DeepSeek-style aggregators.

From OpenAI

One-line diff:

from openai import OpenAI

 client = OpenAI(
-    api_key=os.environ["OPENAI_API_KEY"],
+    api_key=os.environ["QGRE_API_KEY"],
+    base_url="https://api.qgre.com/v1",
 )

 stream = client.chat.completions.create(
-    model="gpt-4o-mini",
+    model="qwen3-1.7b",
     messages=[{"role": "user", "content": "..."}],
     stream=True,
 )

Tool calls, function calling, and JSON mode are not exposed in v1. Plain chat-completion + streaming work identically.

From Anthropic

One-line diff:

from anthropic import Anthropic

 client = Anthropic(
-    api_key=os.environ["ANTHROPIC_API_KEY"],
+    api_key=os.environ["QGRE_API_KEY"],
+    base_url="https://api.qgre.com/anthropic",
 )

 with client.messages.stream(
-    model="claude-haiku-4",
+    model="qwen3-1.7b",
     max_tokens=512,
     messages=[{"role": "user", "content": "..."}],
 ) as stream:
     for text in stream.text_stream:
         print(text, end="", flush=True)

Vision, tool use, and structured outputs are not translated in v1. Plain text + streaming work via the SSE event mapping (message_start → content_block_delta → message_stop).

From DeepSeek / Qwen / Moonshot aggregators

If you're already using a Chinese-provider aggregator (AIPower, SiliconFlow, OpenRouter), you're already on the OpenAI-compatible path. Just change the base_url and api_key.

Cost-tier routing

QGRE adds a delegate-model-choice feature on top of the OpenAI surface. Set the model to auto and pass an X-QGRE-Cost-Tier header — the router picks the cheapest (or most-expensive) healthy priced upstream:

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "summary in 3 bullets"}],
    extra_headers={"X-QGRE-Cost-Tier": "cheap"},
)

Conversation pinning

Pass X-Conversation-Id: <your-stable-id> to pin a multi-turn session to the same upstream replica. KV-cache prefix reuse means each follow-up turn starts much faster — the backend's lcp_reused_tokens counter on /qgre/v1/stats shows the win.