Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.roboticks.io/llms.txt

Use this file to discover all available pages before exploring further.

Pool management

A pool is the routing primitive. Every runner belongs to exactly one pool; every job lands on exactly one pool. This page covers the operational surface around pools — creating, registering runners, rotating tokens, and reading throughput.

Create a pool

  1. Open Settings → Runner Pools → New pool in your project.
  2. Name it (e.g., prod-gpu-farm). Names are unique per project.
  3. Pick a type: self-hosted or hosted.
  4. For hosted, pick a SKU (hosted-ros2-cpu, hosted-gazebo-gpu, hosted-webots-cpu, hosted-webots-gpu).
  5. Save. The pool is now eligible for routing.
Hosted pools are created without any runners — the platform provisions Fargate or EC2 capacity on demand. Self-hosted pools sit idle (and accept no jobs) until you register at least one runner.

Register a runner

Generate a registration token for the pool, then run rbtk-runner register on the machine.
# On any machine with rbtk installed
rbtk pool register-runner --project warehouse --pool prod-gpu-farm
# Registration token: rbtk_pool_reg_xx... (valid 1h, single-use)

# On the runner host
rbtk-runner register \
  --project warehouse \
  --pool prod-gpu-farm \
  --token rbtk_pool_reg_xx... \
  --name gpu-host-04
Registration tokens are single-use and expire in 1 hour. You can mint as many as you need.

Inspect runners in a pool

rbtk pool runners --pool prod-gpu-farm

NAME           STATUS    JOBS      CAPABILITIES                          LAST HEARTBEAT
gpu-host-01    ONLINE    1/4       ros:humble,iron · sim:gazebo · gpu:T4 6s ago
gpu-host-02    ONLINE    3/4       ros:humble,iron · sim:gazebo · gpu:T4 9s ago
gpu-host-03    DRAINING  2/4       ros:humble · sim:gazebo · gpu:T4      11s ago
gpu-host-04    OFFLINE   0/0       (last seen 2h ago)                    —
StatusMeaning
ONLINEHeartbeat ≤ 60 s old; eligible to receive jobs.
DRAININGNo new jobs accepted, but in-flight jobs allowed to finish. Triggered by rbtk-runner drain or a pending upgrade.
OFFLINENo heartbeat for ≥ 60 s. Auto-reaped after 24 hours of silence.

Token rotation

Each runner has a long-lived runner token for authenticating its heartbeat. Tokens rotate on every heartbeat — the platform may return a new token in the heartbeat response, which the runner writes to runner.yaml atomically. There is no manual rotation step. If a runner’s local token is compromised:
rbtk pool runner-revoke --pool prod-gpu-farm --runner gpu-host-04
This invalidates the token immediately; the runner sees the next heartbeat fail with 401, exits, and the host must re-register.

Revoke a runner

rbtk pool runner-revoke --pool prod-gpu-farm --runner gpu-host-04
Or via the dashboard: Settings → Runner Pools → prod-gpu-farm → gpu-host-04 → Revoke. The runner’s runner token is voided; any in-flight job is failed with runner_revoked and requeued for routing.

Delete a pool

rbtk pool delete --project warehouse --pool prod-gpu-farm
Deletion is refused if the pool has any ONLINE runners or in-flight jobs. Drain first:
# Mark all runners draining
rbtk pool drain --pool prod-gpu-farm

# Wait for in-flight jobs to clear (use --watch)
rbtk pool runners --pool prod-gpu-farm --watch

# Then delete
rbtk pool delete --pool prod-gpu-farm

Per-pool job stats

rbtk pool stats --pool prod-gpu-farm --since 24h

POOL               prod-gpu-farm (self-hosted, project: warehouse)
WINDOW             last 24h
JOBS               412 dispatched · 408 completed · 3 failed · 1 requeued
WALL TIME          17h 24m total · avg 152s
QUEUE WAIT         p50 1.2s · p95 7.4s · p99 22.1s
TOP REQUIREMENTS   ros:humble (231) · gazebo-harmonic (164) · gpu (164)
Hosted pools also report billed sim minutes for the window. See Billing.

Tagging and isolation

A common pattern is to tag pools for environments:
rbtk pool create --name prod-gpu-farm --type self-hosted
rbtk pool create --name staging-gpu-farm --type self-hosted
Jobs route by explicit pool selection in the test config or by capability+routing rules. See Test configuration for pool: selectors and requires: { airgapped: true } predicates.

Audit trail

Every pool mutation (create, delete, runner-revoke, token mint) emits an audit-log row visible at Settings → Audit log. Filter by resource_type = pool to extract.

Next steps

Air-gapped mode

Lock down a pool to the on-prem platform only.

GPU setup

Multi-GPU pools, nvidia-container-toolkit.