Skip to main content
sgl is the open-source node operator CLI. It generates your node’s keys, registers it on the grid, runs inference, manages the background service, and toggles maintenance mode. This is the complete command reference.
Install sgl from the official open-source release (see Node setup). You also need a local inference runtime (llama-server) and a GGUF model file to serve.

Typical first run

sgl login                                  # browser-approve with your staked wallet
sgl attest                                 # produce + submit the hardware attestation
sgl service install \                       # run as a background service, serving a model
  --model-path ~/models/Llama-3.2-3B-Instruct-Q4_K_M.gguf \
  --model-name llama-3.2-3b \
  --resource-percent 50

Commands

sgl login

Log in via the browser and register this node (recommended — binds the node to the wallet that holds your stake).
FlagDefaultDescription
--tee-typeapple_seTEE type on this machine (e.g. apple_se)
--modelsComma-separated models to advertise

sgl init

Initialize the node without the browser flow: generate keys and register directly under a wallet.
FlagDefaultDescription
--wallet(required)Staked Solana wallet address this node operates under
--tee-typeapple_seTEE type on this machine
--modelsComma-separated models to advertise

sgl attest

Verify attestation — sign the orchestrator’s challenge and submit the hardware attestation proof. Only attestation-verified nodes receive jobs. Run after login/init and after any binary update.

sgl start

Run the node in the foreground: start the local inference server, begin heartbeating, and process jobs.
FlagDefaultDescription
--model-pathPath to the GGUF model file
--model-nameModel name to advertise (e.g. llama-3.2-3b)
--inference-port8081Port for the local llama-server
--resource-percent100Quick preset (1–100): sets threads, GPU layers, and concurrency proportionally
--threadsautoCPU threads for inference (overrides the preset)
--gpu-layersautoGPU layers to offload (0 = CPU only, 99 = all)
--context-size4096Context window in tokens
--max-jobs1Max concurrent jobs to accept
--batch-size512Prompt batch size
--heartbeat-interval5Heartbeat seconds (lower = faster pickup, more traffic)
Use sgl start to test in the foreground; use sgl service install for production so the node survives reboots, logout, crashes, and idle sleep.

sgl status

Show node status, hardware capabilities, and orchestrator connection info.

sgl off-grid

Go off-grid (maintenance): stop receiving new jobs without penalty, for planned downtime. In-flight jobs finish cleanly. Tamper slashing is unaffected — off-grid only pauses job routing, it can’t dodge a tamper penalty.

sgl on-grid

Come back on-grid: resume receiving jobs.

sgl price

View or set your per-token price. By default your node bills the platform suggested rate (and you keep 80%); you may optionally set your own price within an allowed band — floor = suggested × 0.5, ceiling = suggested × 5. Prices are USD per 1M tokens, split into input (prompt) and output (completion). A model with no custom price simply bills at the suggested rate.
sgl price show                                              # your prices + the suggested rate + the band
sgl price set --model llama-3.2-3b --input 0.004 --output 0.004   # undercut to win more jobs
sgl price reset --model llama-3.2-3b                        # back to the suggested price
SubcommandFlagsDescription
showShow each served model’s price, the suggested rate, and the band
set--model, --input, --outputSet a custom price (USD per 1M tokens; must be in-band and for a model you advertise)
reset--modelRevert a model to the platform suggested price
The server enforces the band and a short cooldown between changes. You can also manage prices from the operator dashboard, and callers compare nodes at GET /v1/providers?model=….

sgl service — background service

Runs the node as a managed OS service (launchd on macOS / systemd on Linux) so it keeps serving across reboots, logout, crashes, and idle sleep.

sgl service install

Install and start the background service.
FlagDefaultDescription
--model-pathPath to the GGUF model file
--model-nameModel name to advertise
--resource-percent50Percentage of system resources to dedicate (1–100)
--inference-port8081Port for the local llama-server
--max-jobs1Max concurrent jobs
--heartbeat-interval5Heartbeat seconds
--sandboxoffmacOS only. Run the node under a Seatbelt sandbox that walls off SSH keys, wallets, keychains, and browser data from the inference process. On Linux, equivalent systemd hardening is applied automatically.

sgl service stop

Stop and remove the background service.

sgl service status

Show whether the service is installed and running.

Maintenance workflow

sgl off-grid          # before planned downtime — stop new jobs
# ... do maintenance ...
sgl on-grid           # resume serving
After updating the node binary, re-run sgl attest so the network re-verifies your enclave. See Node setup and Earnings.