Inference, but actually cheap.
QGRE serves Qwen3 models on consumer Blackwell hardware. Drop-in replacement for OpenAI Chat Completions andAnthropic Messages — point your existing SDK at a different base URL and you're done. Pay with PIX, OXXO, or any major card.
Set base_url to api.qgre.com/v1. Streaming, max_tokens, temperature, stop sequences — everything works.
Same trick at api.qgre.com/anthropic. The router translates Messages JSON ↔ Chat Completions internally; SSE event shapes match.
Pass model: "auto" + X-QGRE-Cost-Tier: cheap. The router picks the cheapest healthy upstream — your bill follows the workload shape, not a flat rate.
No surprise invoices. Auto-topup when balance drops below a threshold; cancel any time via the Stripe portal.
Subscribe to usage.completed events. Standard-Webhooks-signed (HMAC-SHA256), retries, replay, audit log — all of it managed via Svix.
The chat demo shows real-time tok/s and KV-cache prefix reuse. The numbers don't lie — that's what makes the price possible.
Free demo plan. $20/mo dev tier, $99/mo pro with 99.5% SLA, plus a coding-pro flat rate for predictable cost in currency-volatile markets.