Plans + rate limits
Five plans. Switch any time via the Stripe portal.
| Plan | Price | Default RPS | Daily cap | Models | Notes |
|---|---|---|---|---|---|
| demo | $0/mo | 2 | 100 | qwen3-1.7b only | Free tier — kick the tires |
| dev | $20/mo | 10 | 10,000 | all | Hobby + small projects |
| pro | $99/mo | 50 | 100,000 | all + priority routing | 99.5% SLA |
| payg | Pay-as-you-go | 50 | unlimited* | all | $5+ prepaid credits; no monthly fee |
| coding-pro | $19/mo | 20 | 50,000 | all | Predictable cost wrapper for currency-volatile markets |
* Pay-as-you-go has no daily cap; spend is gated by your prepaid balance.
Multi-layer rate limits
- Per-IP bucket (outermost) — defends the auth path from anonymous flooders. 50 RPS by default.
- Per-key bucket — set by your plan + the per-key override on the issued key (override always wins over the plan default).
- Per-(key, model) bucket — same RPS, but isolated per model. Lets a paid customer hammer one model without starving themselves on the other.
- Global concurrency cap — protects the GPU pool from a single tenant filling every slot. 503 with
Retry-After: 1when reached.
Cost-tier routing
Pass model: "auto" + X-QGRE-Cost-Tier: cheap to delegate model choice to the router. We route to the cheapest priced upstream that's healthy. Use expensive for the highest- quality option (typically the largest model).
See migration for a side-by-side example.