About
We're building inference for LATAM, in LATAM.
QGRE was built on a simple bet: most LATAM teams pay tier-1 prices for tier-2 needs. A 1.7B / 4B / 8B model covers the vast majority of production tasks — chat surfaces, code review, summarization, classification — at quality indistinguishable from frontier models for those workloads. What it costs you to ship that should reflect that.
The substrate is pure-Rust + CUDA. We quantize Qwen3 to TQ4 (4-bit nibbles + per-block absmax + Lloyd-Max codebook + WHT rotation). The kernels are hand-optimized for consumer Blackwell — RTX 5070 Ti class — not data-center H100s. The BOM is consumer parts; the bill is consumer prices.
We expose two wire-compatible surfaces (OpenAI ChatCompletions + Anthropic Messages) so you don't have to rewrite anything. Drop-in replacement is the wedge.
And we accept your money the way you actually pay for things: PIX in Brazil, OXXO in Mexico, Boleto, or any major card. Prepaid credits with $5 minimum top-up because nobody wants another monthly invoice in USD that gets eaten by an FX spread.
Open about the tradeoffs
- We're not GPT-5. For frontier reasoning, function calling, vision, and agentic multi-tool workflows, you should use the model that fits. QGRE shines on the long tail of inference where 1.7B–14B is enough and cost matters.
- Single-region for now. We run from a São Paulo deployment for v1. Latency outside LATAM will be 100–200ms higher than peer providers with global edge. We'll add regions when traffic justifies it.
- Tool calls / vision not yet exposed. The substrate supports them; the wire surface is text-only in v1 to keep the cost story honest. Coming when there's customer pull.
Get in touch
Questions or partnership pitches: hello@qgre.com. Quickstart if you want to try it before talking.