`rbtk test`

The test group is the busiest in the CLI. It triggers runs, lists historical runs, fetches structured results, streams logs, downloads artifacts, and pulls signed evidence packs.

rbtk test
├── run               # Run tests against a pool (push code, wait, return result)
├── cloud             # Run on a hosted pool (synonym for `run --pool hosted-*`)
├── list              # List recent runs
├── results           # Structured pass/fail/coverage for a run
├── logs              # Stdout + stderr for a run (alias of `rbtk log <run-id>`)
├── watch             # Live-stream a run's status (queued → running → finished)
├── cases             # List test cases that ran inside a run
├── files             # List or download per-test-case artifacts from a run
├── artifacts         # Hidden alias for `test files` (kept for back-compat)
└── evidence-pack     # Download the signed PDF + ReqIF + ZIP bundle

`rbtk test run`

Kick off a test run. The most common invocation pushes the working directory to the platform, dispatches it to whichever pool matches the project’s routing rules, and waits for completion.

rbtk test run --push ./

Flags

Flag	Type	Description
`--push <path>`	path	Upload this directory as the test payload. Default: refuse if no git remote.
`--git-sha <sha>`	string	Override the recorded commit SHA (auto-detected from git).
`--git-ref <ref>`	string	Override the recorded ref (e.g., `refs/heads/main`).
`--pool <name>`	string	Force routing to a specific pool.
`--ros <distro>`	string	Override required ROS distro (`humble`, `iron`, `rolling`).
`--sim <engine>`	string	Override sim engine (`gazebo-harmonic`, `webots`, etc.).
`--gpu`	flag	Require a GPU runner.
`--watch / --no-watch`	flag	Stream live logs to your terminal. Default: `--watch` on TTY.
`--wait / --no-wait`	flag	Block until the run completes. Default: `--wait`.
`--timeout <duration>`	duration	Maximum wall time (e.g., `30m`).
`--label k=v`	repeatable	Attach labels for filtering later.

Examples

# Most common: push, route, wait
rbtk test run --push ./

# Force a GPU run on a specific pool
rbtk test run --push ./ --pool prod-gpu-farm --gpu

# Don't wait — return the run ID and exit
rbtk test run --push ./ --no-wait
# 8a1f-test-run-id

# Pin commit metadata when running outside git context
rbtk test run --push ./ --git-sha $GIT_SHA --git-ref refs/heads/release-2.4

Exit code

0 if all tests passed. 7 if any test failed. Other non-zero codes indicate infrastructure errors.

`rbtk test cloud`

Convenience alias for rbtk test run --pool hosted-*. Picks the right hosted pool based on requirements.

rbtk test cloud --push ./ --sim gazebo-harmonic   # → hosted-gazebo-gpu
rbtk test cloud --push ./                          # → hosted-ros2-cpu

Same flags as rbtk test run. The command always prints a workspace URL alongside the run ID so you (or a chat agent) can hand the link straight to a stakeholder:

✓ Queued test run 8a1f3c2d-9a01-4d2d-9aab-6a14a92e7baf
  Branch: pivot @ 0xabc123
  Pool:   hosted-gazebo-gpu
  Cost:   ≈ $1.20 (12 sim-minutes)
  Watch:  rbtk test watch 8a1f3c2d
  URL:    https://acme.roboticks.io/warehouse/test-runs/8a1f3c2d-...

The workspace URL is derived from the org slug and project slug — no extra round-trip.

Dry-run (`--no-confirm-charge`)

rbtk test cloud --push ./ --no-confirm-charge

Returns a price quote without spending sim-minutes:

{
  "dry_run": true,
  "estimated_minutes": 12,
  "estimated_cost_usd": 1.20,
  "sim_minutes_after": 488,
  "runner_pool_id": 4,
  "would_succeed": true
}

Add --confirm-charge to flip the same call into a real run.

What the cloud runner does on the EC2 host

For each cloud job, the platform pre-assigns a one-shot runner to the job and bakes its credentials into the EC2 user-data. The boot sequence is fixed and tight:

Install docker.io + curl (Ubuntu 22.04 base AMI).
Resolve the latest runner release from https://get.roboticks.io/releases/latest.
Download the matching roboticks-runner-linux-<arch> binary and chmod +x.
Run roboticks-runner run-job --job-token <token> --job-id <id> --api-endpoint <api> — the binary polls /internal/runners/poll, gets the assignment back, executes the test in Docker, posts status updates, then shutdown -h now.

The <token> is an ephemeral rbtk_lrnr_… runner_token that the platform issues alongside the assignment — not the per-job hex job_token (that field exists on TestJob for future use but isn’t the cloud-runner auth path today). Cloud runners auth the same way self-hosted runners do; the only difference is they have exactly one pre-assigned job waiting in self_hosted_runner_jobs when they call /poll.

Default docker image

If a test package doesn’t pin docker_image and the test request doesn’t override it, the platform falls back to python:3.12-slim so the contract is never null (the runner refuses to execute when docker_image, ros_distro, and sim_engine are all empty). The slim image does not include pytest — if your test_command is pytest -v, the run will exit 127 (“command not found”). Either:

Pin a real ROS2 / pytest-bearing image when you push the package (rbtk packages push ./ --docker-image ghcr.io/your-org/ros-test:humble), or
Make the test command install dependencies first (pip install pytest && pytest -v).

For the canonical ROS2 demos in roboticks-examples we pin ros:humble-ros-base or one of the osrf/ros:*-desktop-full tags.

`rbtk test list`

List recent test runs for the current project.

rbtk test list
rbtk test list --status failed --since 7d
rbtk test list --branch main --limit 20 --output json

Flag	Description
`--status passed\|failed\|running\|queued`	Filter by terminal or in-flight state
`--branch <ref>`	Filter by git ref
`--since <duration>`	Time window (e.g., `24h`, `7d`)
`--label k=v`	Filter by attached labels
`--limit N`	Max rows returned (default 20)

Output

RUN ID    BRANCH         STATUS  TESTS         DURATION   STARTED            POOL
8a1f...   main           PASS    412/412       2m 14s     12 min ago         prod-gpu-farm
8a1c...   pr/214         FAIL    411/412       2m 19s     22 min ago         prod-gpu-farm
8a1a...   pr/214         PASS    412/412       2m 12s     31 min ago         prod-gpu-farm

With --output json, each row includes the run ID, pool, requirement coverage delta, and links.

`rbtk test results`

Structured results for one run.

rbtk test results <run-id>
rbtk test results latest --output json

Output (table):

Run     8a1f3c2d
Branch  main @ 0xabc123
Status  PASS · 412/412 · 2m 14s

REQUIREMENT COVERAGE
  Confirmed       142 (+3)
  Uncovered        18 (-1)
  Total           160

PER-FILE FAILURES (0)

POOL  prod-gpu-farm (self-hosted)

With --output json, returns the full result graph: per-test outcomes, per-requirement links, MCAP attachments, coverage delta vs the previous run on the same branch.

`rbtk test logs`

Stream or fetch logs for a run. Equivalent to rbtk log <run-id>.

rbtk test logs <run-id>             # live tail if running, full dump if completed
rbtk test logs <run-id> --tail 200  # last 200 lines
rbtk test logs <run-id> --since 5m  # last 5 minutes
rbtk test logs <run-id> --stderr    # stderr only

Logs include structured fields when --output json:

rbtk test logs <run-id> --output json | jq 'select(.level == "error")'

`rbtk test watch`

Live-stream a run’s status — queued → running → finished — without leaving the terminal. Polls the platform every second and exits when the run hits a terminal state.

rbtk test watch <run-id>
rbtk test watch latest
rbtk test watch <run-id> --interval 5    # poll every 5 seconds instead
rbtk test watch <run-id> --output json   # one JSON object per status change

Output (table):

[10:14:02] queued    runner_pool=hosted-gazebo-gpu  position=2
[10:14:09] queued    runner_pool=hosted-gazebo-gpu  position=1
[10:14:16] running   8/412 tests · 0/0 failures
[10:14:31] running   210/412 tests · 0/0 failures
[10:14:47] running   412/412 tests · 0/0 failures · post-processing
[10:14:53] completed PASS · 412/412 · 2m 11s

Flag	Description
`--interval <seconds>`	Polling interval (default `1`, max `30`).
`--output json`	Emit one JSON object per status transition (stream-friendly).
`--timeout <duration>`	Stop watching after this long (default: no timeout).

Exit code matches rbtk test results once the run is terminal (0 for pass, 7 for fail, other non-zero for infra error).

`rbtk test cases`

List the test cases that executed inside a run, including their per-test-case S3 prefix. Useful for picking the --nodeid to feed back into rbtk test files.

rbtk test cases <run-id>
rbtk test cases <run-id> --status failed
rbtk test cases <run-id> --output json | jq '.[] | select(.confirms[] == "REQ-014")'

Output (table):

NODEID                                                STATUS   DURATION   CONFIRMS
tests/test_estop.py::test_estop_halts_motion          PASS     0.082s     REQ-001,REQ-014
tests/test_estop.py::test_estop_recovers              PASS     0.094s     REQ-014
tests/test_obstacle.py::test_obstacle_detection       FAIL     2.103s     REQ-021,REQ-022

Flag	Description
`--status passed\|failed\|skipped\|error`	Filter by outcome.
`--confirms <REQ-ID>`	Only test cases that confirm this requirement (repeatable).
`--output json`	Full structured rows including `artifact_prefix` (`test-runs/{run_id}/test-cases/{slug}/`).

`rbtk test files`

List or download per-test-case artifacts from a run — MCAP bags, screenshots, custom files emitted via attach_artifact(), JUnit XML, runner stdout/stderr.

rbtk test files <run-id>                                            # list everything
rbtk test files <run-id> --nodeid "tests/test_estop.py::test_estop_halts_motion"   # one test
rbtk test files <run-id> --prefix attachments/                       # one kind
rbtk test files <run-id> --download --out ./run-artifacts/           # download all
rbtk test files <run-id> --download --nodeid "tests/...::test_x" --out ./test_x/

Flag	Description
`--nodeid <pytest-nodeid>`	Scope to one test case. The CLI re-uses the deterministic `sha256(nodeid)[:16]` slug, so the local layout mirrors S3.
`--prefix <key-prefix>`	Restrict to a sub-prefix inside the run or test case (e.g. `mcap/`, `attachments/`).
`--download`	Download instead of listing.
`--out <dir>`	Output directory (default `.`). Created if missing.
`--pattern <glob>`	Only files matching glob (e.g., `*.mcap`).

On-disk layout mirrors S3 so artifacts stay grouped under the test that produced them:

./run-artifacts/
├── mcaps/                                  # run-level (no nodeid)
└── test-cases/
    ├── a4f3b9c218e07d54/                   # tests/test_estop.py::test_estop_halts_motion
    │   ├── mcap/test_estop.mcap
    │   └── attachments/before.png
    │                    after.png
    └── 9c01e8aa412f7d20/                   # tests/test_obstacle.py::test_obstacle_detection
        └── logs/decision_log.jsonl

rbtk test artifacts remains as a hidden alias for rbtk test files so existing scripts keep working.

`rbtk test evidence-pack`

Download the signed evidence pack for a run. Evidence packs are immutable, hash-chained PDF + ReqIF + ZIP bundles per release.

rbtk test evidence-pack <run-id> --out ./evidence/
rbtk test evidence-pack latest --release v2.4.0 --format pdf

Flag	Description
`--out <dir>`	Output directory (default `.`)
`--format pdf\|reqif\|zip\|all`	Pick a format. `all` extracts the bundle.
`--release <tag>`	Get the pack for a specific release tag instead of a run ID
`--verify`	Verify the hash chain after download

Verification:

rbtk test evidence-pack <run-id> --out ./ --verify

✓ Downloaded evidence-pack-8a1f3c2d.zip (4.2 MB)
✓ Hash chain: 142 entries verified against the platform's signed root
✓ Cosign signature valid
✓ Release tag v2.4.0 matches the embedded manifest

See Evidence packs for the full chain-of-custody story.

Re-run a failed test

rbtk test run --rerun <run-id>            # rerun the same payload on the same pool
rbtk test run --rerun <run-id> --only-failed  # rerun only the tests that failed

Requirements

Upload, coverage, export.

Output formats

Sparkline tables, JSON, YAML, IDs-only.

​rbtk test

​rbtk test run

​Flags

​Examples

​Exit code

​rbtk test cloud

​Dry-run (--no-confirm-charge)

​What the cloud runner does on the EC2 host

​Default docker image

​rbtk test list

​Output

​rbtk test results

​rbtk test logs

​rbtk test watch

​rbtk test cases

​rbtk test files

​rbtk test evidence-pack

​Re-run a failed test

​Next

Requirements

Output formats

`rbtk test`

`rbtk test run`

Flags

Examples

Exit code

`rbtk test cloud`

Dry-run (`--no-confirm-charge`)

What the cloud runner does on the EC2 host

Default docker image

`rbtk test list`

Output

`rbtk test results`

`rbtk test logs`

`rbtk test watch`

`rbtk test cases`

`rbtk test files`

`rbtk test evidence-pack`

Re-run a failed test

Next