sgl is the open-source node operator CLI. It generates your node’s keys, registers it on the grid, runs inference, manages the background service, and toggles maintenance mode. This is the complete command reference.
Install
sgl from the official open-source release (see
Node setup). You also need a local inference runtime (llama-server) and a GGUF model file to serve.
Typical first run
sgl login # browser-approve with your staked wallet
sgl attest # produce + submit the hardware attestation
sgl service install \ # run as a background service, serving a model
--model-path ~/models/Llama-3.2-3B-Instruct-Q4_K_M.gguf \
--model-name llama-3.2-3b \
--resource-percent 50
Commands
sgl login
Log in via the browser and register this node (recommended — binds the node to the wallet that holds your stake).
| Flag | Default | Description |
|---|
--tee-type | apple_se | TEE type on this machine (e.g. apple_se) |
--models | — | Comma-separated models to advertise |
sgl init
Initialize the node without the browser flow: generate keys and register directly under a wallet.
| Flag | Default | Description |
|---|
--wallet | (required) | Staked Solana wallet address this node operates under |
--tee-type | apple_se | TEE type on this machine |
--models | — | Comma-separated models to advertise |
sgl attest
Verify attestation — sign the orchestrator’s challenge and submit the hardware attestation proof. Only attestation-verified nodes receive jobs. Run after login/init and after any binary update.
sgl start
Run the node in the foreground: start the local inference server, begin heartbeating, and process jobs.
| Flag | Default | Description |
|---|
--model-path | — | Path to the GGUF model file |
--model-name | — | Model name to advertise (e.g. llama-3.2-3b) |
--inference-port | 8081 | Port for the local llama-server |
--resource-percent | 100 | Quick preset (1–100): sets threads, GPU layers, and concurrency proportionally |
--threads | auto | CPU threads for inference (overrides the preset) |
--gpu-layers | auto | GPU layers to offload (0 = CPU only, 99 = all) |
--context-size | 4096 | Context window in tokens |
--max-jobs | 1 | Max concurrent jobs to accept |
--batch-size | 512 | Prompt batch size |
--heartbeat-interval | 5 | Heartbeat seconds (lower = faster pickup, more traffic) |
Use sgl start to test in the foreground; use sgl service install for production so the node survives reboots, logout, crashes, and idle sleep.
sgl status
Show node status, hardware capabilities, and orchestrator connection info.
sgl off-grid
Go off-grid (maintenance): stop receiving new jobs without penalty, for planned downtime. In-flight jobs finish cleanly. Tamper slashing is unaffected — off-grid only pauses job routing, it can’t dodge a tamper penalty.
sgl on-grid
Come back on-grid: resume receiving jobs.
sgl price
View or set your per-token price. By default your node bills the platform suggested rate (and you keep 80%); you may optionally set your own price within an allowed band — floor = suggested × 0.5, ceiling = suggested × 5. Prices are USD per 1M tokens, split into input (prompt) and output (completion). A model with no custom price simply bills at the suggested rate.
sgl price show # your prices + the suggested rate + the band
sgl price set --model llama-3.2-3b --input 0.004 --output 0.004 # undercut to win more jobs
sgl price reset --model llama-3.2-3b # back to the suggested price
| Subcommand | Flags | Description |
|---|
show | — | Show each served model’s price, the suggested rate, and the band |
set | --model, --input, --output | Set a custom price (USD per 1M tokens; must be in-band and for a model you advertise) |
reset | --model | Revert a model to the platform suggested price |
The server enforces the band and a short cooldown between changes. You can also manage prices from the operator dashboard, and callers compare nodes at GET /v1/providers?model=….
sgl service — background service
Runs the node as a managed OS service (launchd on macOS / systemd on Linux) so it keeps serving across reboots, logout, crashes, and idle sleep.
sgl service install
Install and start the background service.
| Flag | Default | Description |
|---|
--model-path | — | Path to the GGUF model file |
--model-name | — | Model name to advertise |
--resource-percent | 50 | Percentage of system resources to dedicate (1–100) |
--inference-port | 8081 | Port for the local llama-server |
--max-jobs | 1 | Max concurrent jobs |
--heartbeat-interval | 5 | Heartbeat seconds |
--sandbox | off | macOS only. Run the node under a Seatbelt sandbox that walls off SSH keys, wallets, keychains, and browser data from the inference process. On Linux, equivalent systemd hardening is applied automatically. |
sgl service stop
Stop and remove the background service.
sgl service status
Show whether the service is installed and running.
Maintenance workflow
sgl off-grid # before planned downtime — stop new jobs
# ... do maintenance ...
sgl on-grid # resume serving
After updating the node binary, re-run sgl attest so the network re-verifies your enclave. See Node setup and Earnings.