QGRE — pure-Rust + CUDA TQ4 inference
OpenAI- and Anthropic-compatible API. 4× cost reduction vs. mainstream tier-1 providers, running on consumer Blackwell hardware.
QGRE serves Qwen3-family models behind two wire-compatible base URLs. Point your existing SDK at the right endpoint and it just works:
https://api.qgre.com/v1— OpenAI Chat Completionshttps://api.qgre.com/anthropic— Anthropic Messages
Start at the quickstart for a copy-pasteable example, or jump straight to the migration guide if you're moving from OpenAI or Anthropic.
What makes QGRE different
- 4× cheaper per token vs OpenAI / Anthropic at comparable quality on the smaller-model band (1.7B / 4B / 8B).
- LATAM-first billing. Pay with PIX (Brazil), OXXO (Mexico), Boleto, or any major card. Prepaid credits with $5 minimum top-up sidesteps currency-volatility friction.
- Drop-in replacement. Two base URLs. Your code doesn't change.
- Outbound webhooks. Subscribe to
usage.completedevents for real-time spend tracking via Standard Webhooks-signed deliveries.