Plans + rate limits

Five plans. Switch any time via the Stripe portal.

Plan	Price	Default RPS	Daily cap	Models	Notes
demo	$0/mo	2	100	qwen3-1.7b only	Free tier — kick the tires
dev	$20/mo	10	10,000	all	Hobby + small projects
pro	$99/mo	50	100,000	all + priority routing	99.5% SLA
payg	Pay-as-you-go	50	unlimited*	all	$5+ prepaid credits; no monthly fee
coding-pro	$19/mo	20	50,000	all	Predictable cost wrapper for currency-volatile markets

* Pay-as-you-go has no daily cap; spend is gated by your prepaid balance.

Multi-layer rate limits

Per-IP bucket (outermost) — defends the auth path from anonymous flooders. 50 RPS by default.
Per-key bucket — set by your plan + the per-key override on the issued key (override always wins over the plan default).
Per-(key, model) bucket — same RPS, but isolated per model. Lets a paid customer hammer one model without starving themselves on the other.
Global concurrency cap — protects the GPU pool from a single tenant filling every slot. 503 with Retry-After: 1 when reached.

Cost-tier routing

Pass model: "auto" + X-QGRE-Cost-Tier: cheap to delegate model choice to the router. We route to the cheapest priced upstream that's healthy. Use expensive for the highest- quality option (typically the largest model).

See migration for a side-by-side example.