PagerDuty
PagerDuty is for critical events — the ones worth waking someone up. Roboticks routes a deliberately small set of events here.What gets paged
| Event | Default severity | Why it pages |
|---|---|---|
evidence_pack.generation_failed (on a release tag) | critical | A release can’t ship audit-ready without it |
runner_pool.offline for an air-gapped pool > 15 minutes | critical | Air-gapped customers have no fallback pool |
runner_pool.offline for any pool > 60 minutes | error | Build pipeline is blocked |
test_run.completed (failed) on the default branch after release | error | Regression on a released line |
standard.amendment_published for a pinned safety standard | warning (off by default) | Compliance review required |
Set up
Create a PagerDuty service
In PagerDuty: Services → New service. Pick (or create) an escalation policy. Name it
Roboticks Production or similar.Add an Events API v2 integration
On the new service: Integrations → Add integration → Events API v2. Copy the 32-character Integration Key.
Wire it into Roboticks
Settings → Integrations → PagerDuty → Add. Paste the Integration Key. Name the connector (e.g.,
pd-production).Pick which events page
On the same screen, toggle the event types. The defaults match the table above.
Severity mapping
Roboticks event severities map to PagerDuty incident urgencies:| Roboticks severity | PagerDuty severity | Default urgency |
|---|---|---|
critical | critical | High |
error | error | High |
warning | warning | Low |
info | info | Low (suppressed by default) |
Deduplication and auto-resolve
Roboticks uses a deterministicdedup_key so transient flaps don’t open a second incident:
| Event | dedup_key form |
|---|---|
| Runner pool offline | rbtk:pool-offline:{project_slug}:{pool_name} |
| Evidence pack failure | rbtk:evidence-fail:{project_slug}:{release_tag} |
| Post-release test failure | rbtk:postrelease-fail:{project_slug}:{branch}:{test_id} |
event_action: resolve and PagerDuty auto-closes the incident.
Multiple services
You’ll often want separate PagerDuty services per environment:Event payload
Suppression windows
You can suppress paging during planned maintenance:Troubleshooting
No incidents created
No incidents created
- Verify the Integration Key is the Events API v2 key (32 chars), not a v1 or REST API key.
- Check the delivery log in Roboticks at Settings → Integrations → PagerDuty → Delivery log.
Duplicate incidents for the same condition
Duplicate incidents for the same condition
Most often a PagerDuty alert-grouping rule on your service is overriding the dedup_key. Set the service’s incident behaviour to “Create alerts and incidents” with grouping “Off — use Roboticks deduplication”.
Incidents don't auto-resolve
Incidents don't auto-resolve
Roboticks sends
event_action: resolve when the condition clears. If incidents stay open, check the delivery log for resolve rows. If they’re being sent but PagerDuty isn’t acting, your service’s notification policy may be sticky — adjust the Auto-resolve setting.Best practice
- One service per environment. Don’t mix prod and staging on one rota.
- Route only the five default event types. Resist the urge to page on PR failures.
- Add a runbook URL in PagerDuty’s service config — it appears in every incident.
- Test the rotation end-to-end at least quarterly.