Documentation Index
Fetch the complete documentation index at: https://docs.roboticks.io/llms.txt
Use this file to discover all available pages before exploring further.
AI for test debugging
Five AI surfaces sit on the test side of the product. Each one is a button you press; each one writes its output back into the run’s history and conversation thread so the next engineer doesn’t have to re-prompt.Test failure triage
Surface: Run-detail page → AI Triage panel on any failed TestJob row. What it ingests:- The failing TestJob’s metadata (name, error message, exit code).
- Up to 200 recent log lines from the run.
- The runner system log (capped at 50 KB) — the agent-side log that captures the launch failure, the docker pull error, etc.
- The test’s
@confirmsand@tagsso the prompt knows what requirements were at stake.
- A short summary of the most likely root cause.
- Three to five concrete recommendations (rerun with a longer timeout / fix the topic name in
assert_topic_published/ mark@deadlineslack-friendly). - A severity (
low | medium | high | critical) used by the dashboard sort. - A confidence score (
0.0 – 1.0). - Three follow-up questions you can click to start a conversation.
POST /api/v1/organizations/{org}/projects/{project}/analyze/test-failure.
Plan gate: any plan with non-zero ai_tokens_balance.
Test-run analysis
Surface: Run-detail page → “Analyze run” button. Same plumbing as test-failure triage, but the prompt is scoped to the whole run (all failures and skipped tests, plus the run’s overall logs). Useful when more than one test broke and you want a single thread instead of N triage analyses. Endpoint:POST /analyze/test-run with {test_run_id, include_logs, log_limit}.
Conversations
Every analysis can be turned into a follow-up conversation. The platform keeps the original analysis as the system message, lets you and the AI exchange short messages, and tracks token usage per turn against yourai_tokens balance.
POST /conversationswith{analysis_id, initial_question}POST /conversations/{id}/messagesto continuePOST /conversations/{id}/archiveto close
Test flakiness
Surface: Test case detail → “Flakiness analysis” dialog. Available once a test has at least three runs in the project’s history. What it does: takes the last N runs of the same test identity (matched byroboticks.nodeid), bundles their pass/fail outcomes + stderr snippets + duration deltas, and asks Sonnet to classify the flakiness into one of:
- Environment / timing — passes in CI but fails locally, or vice versa; tied to a fault-injection rate or a slow sim seed.
- State leak — depends on test ordering; cleanup not running.
- Genuinely intermittent — race condition, non-deterministic upstream input.
- Stable, you’re imagining it.
@flaky, add a retry, fix the cleanup hook, etc.).
Task type: TEST_FLAKINESS_ANALYSIS — Sonnet, 30k input budget, 12 token-cost.
Sim vs real comparison
Surface: Sim run detail page → “Compare to real run” action. You pick a real-world run for the same project; the platform pairs the runs byroboticks.nodeid and asks Opus to call out divergences.
The prompt builds a side-by-side table of:
- Pass/fail outcome per test.
- Duration delta.
- Topic publish/receive counts from the MCAPs if present.
- Fault-injection state if either side used
roboticks.fault_injection.
/scan jitter”), grouped by likely cause. This is the densest AI surface in the product — Opus, 50 000-token input window, 30 token-cost per call.
Task type: SIM_COMPARISON.
Inline log anomalies
Surface: Run-detail logs tab. Lines the platform flagged as anomalous are highlighted with a sidebar marker; click the marker to see a short Haiku-generated explanation. Detection is staged:- Heuristic pass (regex + simple statistical thresholds) marks candidate lines server-side.
- Up to ~20 candidate lines per request are batched into a single Haiku call to summarise the anomaly: what’s unusual, what other lines correlate, whether the line is consistent with a known failure mode.
LOG_SUMMARIZATION — 2 token-cost — so the cost stays linear in lines × runs rather than per-line. Cached results are stored on the AIAnalysis row so re-opening the logs tab is free.
Feedback loop
Every AI card has a thumbs-up / thumbs-down + optional comment. The feedback is stored on theAIAnalysis row (feedback_helpful, feedback_comment, feedback_at) and surfaced in the AI usage admin dashboard. Use it — the platform team retrains prompt templates against negative feedback signal.
What this is NOT
- It is not a deterministic test oracle. A failed
@confirmsis still failed even if the AI is hopeful about it. - It does not modify any code. The “fix” is always a suggestion you implement yourself.
- It does not see your private repo unless your test logs include code paths. Sandbox the test environment if log scrubbing matters to you.
Next
Requirements & traceability assists
Move from “why did this test fail” to “what requirement was at stake”.
Evidence & standards
Audit-time AI assists for the engineer-to-auditor handoff.