QGRE — pure-Rust + CUDA TQ4 inference

OpenAI- and Anthropic-compatible API. 4× cost reduction vs. mainstream tier-1 providers, running on consumer Blackwell hardware.

QGRE serves Qwen3-family models behind two wire-compatible base URLs. Point your existing SDK at the right endpoint and it just works:

  • https://api.qgre.com/v1 — OpenAI Chat Completions
  • https://api.qgre.com/anthropic — Anthropic Messages

Start at the quickstart for a copy-pasteable example, or jump straight to the migration guide if you're moving from OpenAI or Anthropic.

What makes QGRE different

  • 4× cheaper per token vs OpenAI / Anthropic at comparable quality on the smaller-model band (1.7B / 4B / 8B).
  • LATAM-first billing. Pay with PIX (Brazil), OXXO (Mexico), Boleto, or any major card. Prepaid credits with $5 minimum top-up sidesteps currency-volatility friction.
  • Drop-in replacement. Two base URLs. Your code doesn't change.
  • Outbound webhooks. Subscribe to usage.completed events for real-time spend tracking via Standard Webhooks-signed deliveries.