Documentation Index
Fetch the complete documentation index at: https://docs.roboticks.io/llms.txt
Use this file to discover all available pages before exploring further.
Troubleshooting
Most runner issues fall into one of five buckets. Start withrbtk-runner doctor, then walk the list below.
Runner never picks up a job
A runner that heartbeatsONLINE but never receives jobs is almost always a capability mismatch.
Diagnose
Common mismatches
| Job needs | Runner declares | Fix |
|---|---|---|
ros: humble | ros_distros: [iron] only | Install Humble; add humble to runner.yaml |
sim: gazebo-harmonic | sim_engines: [gazebo-classic] | Install Harmonic; switch the declaration |
gpu: true | gpu.enabled: false | See GPU setup |
label ldra-licensed | label not present | Add the label to the runner’s capabilities.labels |
runner.yaml or restart the service after changes.
MCAP upload fails
The runner uploads MCAP files via S3 presigned URLs that the platform mints just-in-time.Symptom
Causes and fixes
| Symptom | Cause | Fix |
|---|---|---|
403 SignatureDoesNotMatch immediately on first chunk | Host clock skew > 5 minutes | sudo timedatectl set-ntp true; verify with timedatectl status |
403 Request has expired mid-upload | MCAP > 5 GB and upload slower than the 1-hour presign TTL | Switch to multipart upload (enabled by default in v2.2+); upgrade if pinned older |
connection refused to S3 host | Outbound firewall blocking S3 endpoint | Allow *.s3.amazonaws.com and *.s3.<region>.amazonaws.com. Air-gapped: allow your on-prem object store |
407 Proxy authentication required | Corporate proxy in the path | Set HTTPS_PROXY and NO_PROXY=api.roboticks.io,s3.internal in the systemd unit |
Heartbeat lapse
The platform marks a runnerOFFLINE after 60 s without a heartbeat.
Diagnose
Common causes
| Cause | Symptom | Fix |
|---|---|---|
| Runner token revoked | 401 unauthorized on heartbeat | Re-register with a fresh registration token |
| Platform unreachable | connection timed out | Check firewall, DNS, proxy |
| Process killed (OOM) | dmesg shows Out of memory: Killed process rbtk-runner | Lower max_concurrent_jobs or add memory |
| Clock skew | Heartbeat 401s with clock skew detected | Sync NTP |
| Docker daemon down | Logs show cannot connect to docker daemon | sudo systemctl start docker |
Version skew with the platform
rbtk-runner v2.x is wire-compatible with the latest platform. If the platform advances a wire-contract minor version, the runner emits a deprecation warning, and after 90 days a hard error.
Diagnose
Fix
Upgrade via the install path you used originally:Registration token issues
--ttl 24h and --uses 50 — these flags require Enterprise tier.
Job timeouts
A job killed at thejob_timeout boundary surfaces as failed with reason runner_timeout.
Diagnose
Fix
Increaseresources.job_timeout in runner.yaml:
timeout: in .roboticks/test.yaml.
Docker permission denied
docker group:
Disk space exhaustion
The runner cleans up containers and the work-dir between jobs, but it does not prune Docker images. Periodically:Getting more signal
Crank logs todebug and re-run:
debug includes every HTTP request to the platform, the full docker run command for each job, and S3 multipart upload chunk timings.
Still stuck?
- Collect
rbtk-runner doctor,rbtk-runner status, and the last 200 lines of the service log. - File at github.com/roboticks-io/roboticks-runner/issues with the bundle.
- For paid plans, open a support ticket at support.roboticks.io referencing your org slug.
Related
Configuration
Capabilities, resource limits, log level.
GPU setup
NVIDIA driver and container-toolkit issues.