> ## Documentation Index
> Fetch the complete documentation index at: https://docs.x402layer.cc/llms.txt
> Use this file to discover all available pages before exploring further.

# Pricing

> How SGL Grid inference is priced — per-token, per-model, pay only for what you use.

SGL Grid is **usage-based**: you pay per request, priced by the model and the tokens it processes. No subscription, no minimums.

## How cost is calculated

* Each model has a **per-token price** (input and output).
* Cost for a request = `tokens × the model's rate`.
* **Credits** are charged the **exact** amount after inference (on real token usage). **x402** charges at request time based on the model and your `max_tokens`.

The live per-model price is shown in the [Models](/cloud/grid/models) view and the models API:

```bash theme={null}
curl https://grid.x402compute.cc/grid/models
```

## Per-provider pricing

Most nodes serve at the platform suggested rate, but operators may set their **own** per-token price within a band (suggested × 0.5 to × 5). You can compare the nodes serving a model — each with its effective price, cheapest first — and optionally pin one:

```bash theme={null}
curl "https://grid.x402compute.cc/v1/providers?model=llama-3.2-3b"
```

Pass the chosen `node_id` as `node` on your chat request (or the `node=` argument in the SDKs) to use a specific provider; omit it and the grid routes for you.

### Capping price (`max_price`)

Don't want to pick manually? Pass **`max_price`** (blended USD per 1M tokens) on your chat/reserve request — the grid only routes to nodes at or under that rate, and the request is **never billed above it**. Works in every routing mode. It's a typed option in the SDKs (`max_price` / `maxPrice`).

```bash theme={null}
# raw API
curl -X POST https://grid.x402compute.cc/v1/reserve \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.2-3b","max_price":0.01}'
```

If no node qualifies under the cap, the request returns `price_cap_unmet` instead of overcharging you.

## Where the money goes

Each paid inference is split across the people who make the network work:

| Share   | Goes to                                   |
| ------- | ----------------------------------------- |
| **80%** | The node operator who served your request |
| **5%**  | Platform                                  |
| **10%** | $SGL stakers (paid as USDC + $SGL)        |
| **5%**  | \$SGL buy-and-burn                        |

So usage directly rewards operators and stakers, and continuously buys + burns \$SGL.

## Keeping costs down

* Pick the **smallest model** that does the job (see [Models](/cloud/grid/models)).
* Set a sensible `max_tokens`.
* Use **credits** for steady usage (exact billing) and **x402** for occasional/agent use.

## Tracking spend

The dashboard's **API usage** view shows your requests, tokens, spend, and a per-model breakdown over time. See [Billing](/cloud/concepts/billing).
