About QGRE

QGRE was built on a simple bet: most LATAM teams pay tier-1 prices for tier-2 needs. A 1.7B / 4B / 8B model covers the vast majority of production tasks — chat surfaces, code review, summarization, classification — at quality indistinguishable from frontier models for those workloads. What it costs you to ship that should reflect that.

The substrate is pure-Rust + CUDA. We quantize Qwen3 to TQ4 (4-bit nibbles + per-block absmax + Lloyd-Max codebook + WHT rotation). The kernels are hand-optimized for consumer Blackwell — RTX 5070 Ti class — not data-center H100s. The BOM is consumer parts; the bill is consumer prices.

We expose two wire-compatible surfaces (OpenAI ChatCompletions + Anthropic Messages) so you don't have to rewrite anything. Drop-in replacement is the wedge.

And we accept your money the way you actually pay for things: PIX in Brazil, OXXO in Mexico, Boleto, or any major card. Prepaid credits with $5 minimum top-up because nobody wants another monthly invoice in USD that gets eaten by an FX spread.

Questions or partnership pitches: hello@qgre.com. Quickstart if you want to try it before talking.

We're building inference for LATAM, in LATAM.

Open about the tradeoffs

Get in touch