Commands & Rollouts Feature Specification
Table of Contents
- Overview
- Commands Feature
- Rollouts Feature
- Tunnel Feature
- Device Groups
- Data Flow
- Technical Specifications
- Implementation Checklist
Overview
This document describes the Commands and Rollouts features for the Roboticks platform. These features enable remote device management, command execution, and controlled deployment of packages to fleet devices via MQTT (AWS IoT Core).Key Capabilities
- Commands: Send commands to devices for diagnostics, control, and management
- Rollouts: Controlled deployment of packages to device fleets with progressive strategies
- Tunnel: Secure remote access to devices via SSH/terminal with web-based interface
- Device Groups: Organize devices for targeted commands and rollouts
- Audit Trail: Complete history of all commands, deployments, and tunnel sessions with user attribution
Commands Feature
Overview
The Commands feature allows users to send commands to individual or multiple devices via MQTT. Commands can be predefined (system operations) or custom (user-defined scripts).Database Models
DeviceCommand Model
File:backend/app/models/command.py
API Endpoints
File:backend/app/api/v1/commands.py
MQTT Service
File:backend/app/services/mqtt_command_service.py
Frontend Components
Commands Page
File:frontend/src/pages/Commands.tsx
Features:
- Table view of all commands in the project
- Filters: Status, Type, Device, Date Range, User
- Search by command name or device name
- Real-time status updates (polling every 5 seconds for active commands)
- Click row to view detailed command info
- Bulk operations: Cancel multiple pending commands
- Command (icon + name)
- Type (chip with color)
- Device (name + status indicator)
- Status (with progress for running commands)
- Created By (user name)
- Created At (relative time)
- Duration (calculated from timestamps)
- Actions (View Details, Cancel if pending)
New Command Dialog
Wizard Steps:-
Select Target(s)
- Single device: Dropdown with search
- Multiple devices: Multi-select with device groups
- Show device online/offline status
-
Choose Command
- Card-based selection of predefined commands
- Each card shows: Icon, Name, Description, Est. Duration
- “Custom Command” option at the end
-
Configure
- Dynamic form based on command type
- For deploy_package: Deployment selector
- For custom: Code editor for script
- Priority slider: Low (7-10), Normal (4-6), High (1-3)
- Timeout setting with presets
-
Review & Send
- Summary of all selections
- Warning if any devices are offline
- Estimated total execution time
- Option to add notes
Command Detail Dialog
Tabs:- Overview: Status, timestamps, creator, device info
- Payload: JSON viewer for command_payload
- Response: JSON viewer for response_payload (if available)
- Logs: Error messages and execution details
Rollouts Feature
Overview
Rollouts enable controlled deployment of packages to device fleets with progressive strategies, health monitoring, and automatic rollback capabilities.Database Models
DeviceGroup Model
File:backend/app/models/device_group.py
Rollout Model
File:backend/app/models/rollout.py
DeviceRollout Model
File:backend/app/models/rollout.py (continued)
RolloutTemplate Model
File:backend/app/models/rollout_template.py
Rollout Templates
Templates allow users to save and reuse rollout configurations for consistency across deployments.Template API Endpoints
File:backend/app/api/v1/rollout_templates.py
Default Templates
The system should create these default templates when a project is created:-
All at Once
- Deploy to all devices simultaneously
- Use for: Small fleets, non-critical updates
-
Fast Progressive
- 20% → 50% → 100%
- Wait times: 30min, 60min
- Use for: Standard rollouts
-
Conservative Progressive (default)
- 10% → 30% → 100%
- Wait times: 2hr, 2hr
- Use for: Critical updates, production
-
Canary (5 devices)
- 5 devices → all remaining
- Wait time: 1hr
- Use for: Testing new versions
Package Download Proxy
Devices download packages through an API proxy endpoint (not direct S3) for better tracking and control.Download Proxy Endpoint
File:backend/app/api/v1/downloads.py
Progress Reporting
Devices report progress every 15 seconds during download and installation.Progress Report Format
MQTT Topic:roboticks/{project_id}/devices/{device_id}/rollout/progress
Payload:
Tunnel Feature
Overview
The Tunnel feature provides secure remote access to fleet devices through web-based terminal and SSH tunneling. It uses AWS IoT Core MQTT for bidirectional communication, allowing users to execute commands, debug issues, and interact with devices in real-time without VPN or direct network access.Key Features
- Web-Based Terminal: xterm.js-powered terminal in browser
- SSH Protocol Support: Standard SSH clients can connect through tunnel
- Session Management: Track all tunnel sessions with full audit trail
- Multi-User Support: Multiple users can tunnel to same device (different sessions)
- Session Recording: Optional session recording for compliance
- Idle Timeout: Automatic disconnection after inactivity
- Rate Limiting: Prevent abuse with per-user connection limits
Database Models
DeviceTunnel Model
File:backend/app/models/tunnel.py
API Endpoints
File:backend/app/api/v1/tunnels.py
MQTT Service
File:backend/app/services/mqtt_tunnel_service.py
WebSocket Manager
File:backend/app/services/websocket_manager.py
Frontend Components
Terminal Page (New)
File:frontend/src/pages/Terminal.tsx
Features:
- Embedded xterm.js terminal
- Device selector dropdown
- Connection status indicator
- Session controls (Connect, Disconnect)
- Terminal settings (font size, theme)
- Session statistics (bytes sent/received, duration)
- Reconnect on disconnect
- xterm.js for terminal emulation
- WebSocket client for bidirectional communication
- Automatic reconnection logic
- Terminal resize handling (responsive)
Tunnel Sessions Page
File:frontend/src/pages/Tunnels.tsx (or add tab to Fleet page)
Features:
- Table view of active and historical tunnel sessions
- Filters: Device, User, Status, Date Range
- Real-time status updates for active tunnels
- “Kill Session” button for active tunnels (admin only)
- Session details dialog with statistics
- User (name + avatar)
- Device (name + status)
- Type (Web Terminal / SSH / Port Forward)
- Status (with duration for active)
- Connected At
- Duration / Disconnected At
- Data Transfer (↑↓ bytes)
- Actions (View Details, Kill Session)
Device Agent Implementation
The device agent (roboticks-agent) needs to implement tunnel functionality: File:device-agent/tunnel_handler.py (pseudocode)
Security Considerations
-
Authentication:
- WebSocket connections require valid JWT token
- Device authentication via AWS IoT Core certificates
- Per-user rate limiting (max concurrent tunnels)
-
Authorization:
- Check user has project access before creating tunnel
- Check device belongs to project
- Optional: Role-based access (only admins can tunnel to production)
-
Session Recording:
- Configurable per project or per tunnel
- Stored in S3 with encryption
- Retention policy (default: 90 days)
- Audit log of who accessed recordings
-
Network Security:
- WebSocket over TLS (wss://)
- MQTT over TLS
- No direct device network access (all via AWS IoT Core)
-
Rate Limiting:
- Max 3 concurrent tunnels per user
- Max 10 concurrent tunnels per device
- Max tunnel duration: 2 hours (configurable)
- Idle timeout: 10 minutes (configurable)
Performance Considerations
-
Latency:
- Expected latency: 50-200ms (depends on device location)
- MQTT QoS 0 for tunnel data (minimal overhead)
- Direct WebSocket connection (no polling)
-
Bandwidth:
- Terminal data is small (~1-10KB/s typical usage)
- MQTT message size limit: 128KB
- Chunk large outputs into multiple messages
-
Scalability:
- WebSocket connections handled by FastAPI/Uvicorn
- Horizontal scaling via load balancer
- MQTT scales to millions of devices (AWS IoT Core)
-
Resource Usage:
- One WebSocket connection per user session
- One MQTT subscription per tunnel per device
- Memory: ~1MB per active tunnel (server-side)
Device Groups
Group Types
-
Manual Groups
- Explicitly list device IDs
- Static membership
- Example: “Test Fleet” = [device1, device2, device3]
-
Tag-Based Groups
- Dynamic membership based on device tags
- Automatically includes/excludes devices as tags change
- Example: All devices with
{"environment": "production", "site": "warehouse-1"}
-
Query Groups
- Advanced filtering with conditions
- Example: All DRONE devices with firmware >= 1.0.0 and status ONLINE
Group API Endpoints
File:backend/app/api/v1/device_groups.py
Data Flow
Command Flow
Rollout Flow
Technical Specifications
MQTT Configuration
- QoS Level: 2 (Exactly once delivery)
- Topic Structure:
- Commands:
roboticks/{project_id}/devices/{device_id}/commands - Command Responses:
roboticks/{project_id}/devices/{device_id}/commands/response - Rollout Progress:
roboticks/{project_id}/devices/{device_id}/rollout/progress
- Commands:
- Message Format: JSON
- Retention: Commands retained for 1 hour if device offline
Package Downloads
- Method: API Proxy (not direct S3)
- Endpoint:
GET /api/v1/downloads/packages/{deployment_id}/file - Authentication: Device certificate + download token
- Tracking: Backend logs download start, progress, completion
- Benefits:
- Better access control
- Download metrics
- Ability to throttle/cancel
- Error tracking
Progress Reporting
- Frequency: Every 15 seconds during active operations
- Topics: Device-specific progress topic
- Payload: Status, progress percentage, current step, errors
- Processing: Lambda → Webhook → Backend updates DeviceRollout
Concurrency Control
- Per Device: Maximum 1 active deployment at a time
- Per Rollout: No limit on devices (scales with fleet size)
- Per Project: No hard limit, but monitored for performance
Data Retention
- Commands: Forever (full audit trail)
- Command Responses: Forever
- Rollouts: Forever
- DeviceRollouts: Forever
- MQTT Messages: 1 hour retention on broker
Database Indexes
Required indexes for performance:Implementation Checklist
Phase 1: Core Models & Database
- Create tunnel.py model
- Create command.py model
- Create device_group.py model
- Create rollout.py model
- Create rollout_template.py model
- Generate Alembic migration
- Add relationships to Project and FleetDevice models
- Run migration on dev database
Phase 2: Backend Services - Tunnel (PRIORITY)
- Implement MQTTTunnelService
- Implement WebSocketManager
- Create webhook endpoint for tunnel output
- Add Lambda function for tunnel MQTT routing
- Implement tunnel idle timeout monitor
Phase 3: Backend API Endpoints - Tunnel (PRIORITY)
- Tunnel API (create, get, list, close)
- WebSocket endpoint for tunnel connections
- Tunnel session recording storage
- Rate limiting middleware
Phase 4: Frontend - Tunnel (PRIORITY)
- Install xterm.js and xterm-addon-fit
- Create Terminal page with xterm.js
- Implement WebSocket client for tunnel
- Add device selector and connection controls
- Add session statistics display
- Create Tunnel Sessions management page
- Add tunnel session history view
Phase 5: Backend Services - Commands & Rollouts
- Implement MQTTCommandService
- Implement RolloutService
- Implement download proxy endpoint
- Create webhook endpoints for MQTT responses
- Add Lambda functions for IoT Rules
Phase 6: Backend API Endpoints - Commands & Rollouts
- Commands API (list, create, get, cancel)
- Command templates endpoint
- Device Groups API (CRUD)
- Rollouts API (CRUD, start, pause, resume, rollback)
- Rollout Templates API (CRUD)
- Download proxy endpoints
Phase 7: Frontend - Commands
- Create Commands page with table view
- Add filters and search
- Implement New Command wizard dialog
- Create Command Detail dialog
- Add real-time status polling
- Add bulk operations
Phase 8: Frontend - Device Groups
- Create Device Groups page
- New Group wizard (manual, tag-based, query)
- Group editing and device preview
- Integration with Commands and Rollouts
Phase 9: Frontend - Rollouts
- Create Rollouts page with active/completed sections
- Implement New Rollout wizard
- Create Rollout Detail view with progress
- Add rollout controls (pause, resume, rollback)
- Real-time progress updates
- Rollout Templates management page
Phase 10: Testing & Documentation
- Unit tests for models
- Integration tests for services
- API endpoint tests
- Frontend component tests
- End-to-end tunnel test
- End-to-end rollout test
- Update API documentation
- Create user guide
Phase 11: Device Agent Updates
- Implement TunnelHandler for PTY management
- Update agent to subscribe to tunnel control topic
- Subscribe to command topics
- Implement command execution handlers
- Add package download via proxy
- Implement progress reporting (every 15s)
- Add health check reporting
- Error handling and retry logic
Future Enhancements (Not in Initial Implementation)
- Command Templates Library: User-created reusable commands
- Command Scheduling: Schedule commands for future execution
- Command Chaining: Execute sequence of commands
- Approval Workflow: Require approval for critical commands
- Advanced Health Checks: Custom health check scripts
- A/B Testing Metrics: Automatic version comparison
- Notification System: Email/Slack alerts
- Geographic Grouping: Location-based device groups
- Time-zone Aware Rollouts: Deploy during off-peak per region
- Rollout Simulation: Dry-run mode
- Auto-advance Stages: Progress automatically on health check pass
- Manual Stage Approval: Require manual approval per stage
References
Document Version: 1.0 Last Updated: 2025-11-14 Author: Roboticks Engineering Team