How cost is calculated
- Each model has a per-token price (input and output).
- Cost for a request =
tokens × the model's rate. - Credits are charged the exact amount after inference (on real token usage). x402 charges at request time based on the model and your
max_tokens.
Per-provider pricing
Most nodes serve at the platform suggested rate, but operators may set their own per-token price within a band (suggested × 0.5 to × 5). You can compare the nodes serving a model — each with its effective price, cheapest first — and optionally pin one:node_id as node on your chat request (or the node= argument in the SDKs) to use a specific provider; omit it and the grid routes for you.
Capping price (max_price)
Don’t want to pick manually? Pass max_price (blended USD per 1M tokens) on your chat/reserve request — the grid only routes to nodes at or under that rate, and the request is never billed above it. Works in every routing mode. It’s a typed option in the SDKs (max_price / maxPrice).
price_cap_unmet instead of overcharging you.
Where the money goes
Each paid inference is split across the people who make the network work:| Share | Goes to |
|---|---|
| 80% | The node operator who served your request |
| 5% | Platform |
| 10% | SGL) |
| 5% | $SGL buy-and-burn |
Keeping costs down
- Pick the smallest model that does the job (see Models).
- Set a sensible
max_tokens. - Use credits for steady usage (exact billing) and x402 for occasional/agent use.
