Plans + rate limits

Five plans. Switch any time via the Stripe portal.

PlanPriceDefault RPSDaily capModelsNotes
demo$0/mo2100qwen3-1.7b onlyFree tier — kick the tires
dev$20/mo1010,000allHobby + small projects
pro$99/mo50100,000all + priority routing99.5% SLA
paygPay-as-you-go50unlimited*all$5+ prepaid credits; no monthly fee
coding-pro$19/mo2050,000allPredictable cost wrapper for currency-volatile markets

* Pay-as-you-go has no daily cap; spend is gated by your prepaid balance.

Multi-layer rate limits

  • Per-IP bucket (outermost) — defends the auth path from anonymous flooders. 50 RPS by default.
  • Per-key bucket — set by your plan + the per-key override on the issued key (override always wins over the plan default).
  • Per-(key, model) bucket — same RPS, but isolated per model. Lets a paid customer hammer one model without starving themselves on the other.
  • Global concurrency cap — protects the GPU pool from a single tenant filling every slot. 503 with Retry-After: 1 when reached.

Cost-tier routing

Pass model: "auto" + X-QGRE-Cost-Tier: cheap to delegate model choice to the router. We route to the cheapest priced upstream that's healthy. Use expensive for the highest- quality option (typically the largest model).

See migration for a side-by-side example.