Roboticks SDK Integration Plan
Executive Summary
This document outlines the integration plan for incorporating all new Roboticks platform features into the roboticks-sdk. The goal is to evolve the SDK from a simple “Hello World” demo into a production-ready system that leverages the full capabilities of the platform.1. Current State Analysis
SDK Capabilities (What We Have)
- ✅ Device registration and credential provisioning
- ✅ Session lifecycle management
- ✅ Real-time log streaming via MQTT
- ✅ Heartbeat monitoring
- ✅ Command polling (HTTP + MQTT)
- ✅ File upload via S3 presigned URLs
- ✅ ZeroMQ local messaging
- ✅ Module framework with auto-registration
Platform Features (What We Need to Integrate)
- 🆕 Compositions - Docker multi-container deployments from ECR
- 🆕 Capsules - Native binary/edge ML model packages
- 🆕 Configurations - Environment variable management
- 🆕 Packages (Deployments) - Versioned releases combining capsules + compositions + configs
- 🆕 Rollouts - Progressive deployment strategies (canary, blue/green, offline)
- 🆕 Environments - Fleet-wide and device-specific variable overrides
- 🆕 Commands - Enhanced command system with timeout/retry
- 🆕 Reverse Tunnels - Secure SSH/HTTP access to edge devices
- 🆕 Device Groups - Logical device organization for targeted deployments
2. Integration Architecture
2.1 High-Level Component Design
3. Feature Integration Breakdown
3.1 Capsules (Native Binary Packages)
What Are Capsules?- Versioned native binaries compiled for target architecture (ARM64, x86_64)
- Include: executables, shared libraries, configuration files, ML models
- Current SDK modules (HelloWorldModule) will be packaged as capsules
Backend Changes
- Add capsule metadata to device session tracking
- Store capsule version in
fleet_devices.active_capsule_id - Track installation history in new table:
capsule_installations
SDK Changes
New File:/packages/roboticks-device/include/CapsuleManager.hpp
GET /api/v1/organizations/{org}/projects/{project}/deployments/{deployment_id}/download- Get presigned S3 URLPOST /api/v1/organizations/{org}/projects/{project}/deployments/{deployment_id}/status- Report installation progress
- Publish:
roboticks/devices/{dsn}/capsule-status- Installation progress - Subscribe:
roboticks/devices/{dsn}/capsule-install- Install command
3.2 Compositions (Docker Multi-Container)
What Are Compositions?- Docker Compose-style multi-container applications
- Stored in AWS ECR (Elastic Container Registry)
- Examples: ROS2 nodes, ML inference servers, web UIs, databases
SDK Changes
New File:/packages/roboticks-device/include/CompositionManager.hpp
- Requires Docker Engine or Docker Compose CLI on edge device
- SDK calls Docker API via Unix socket (
/var/run/docker.sock) - Composition manifests downloaded from backend as YAML files
GET /api/v1/organizations/{org}/projects/{project}/compositions- List available compositionsGET /api/v1/organizations/{org}/projects/{project}/compositions/{tag}- Get specific composition manifest
3.3 Configurations (Environment Variables)
What Are Configurations?- Named sets of environment variables (e.g., “production”, “staging”, “debug”)
- Two scopes: Fleet-wide (PROJECT) and Device-specific (DEVICE)
- Changes propagate instantly without rebuilding images
SDK Changes
New File:/packages/roboticks-device/include/EnvironmentManager.hpp
- Device-specific variables (highest priority)
- Fleet-wide (PROJECT) variables
- Default values from capsule/composition config
- System environment variables (lowest priority)
GET /api/v1/organizations/{org}/projects/{project}/environment-configs- List configsGET /api/v1/organizations/{org}/projects/{project}/environment-variables?device_id={dsn}- Get merged variables- Subscribe:
roboticks/devices/{dsn}/environment-update- Real-time updates
- Change camera resolution without redeploying
- Toggle debug logging on specific devices
- Update ML model paths dynamically
- Configure API endpoints per environment
3.4 Packages (Deployments)
What Are Packages?- Versioned releases that bundle:
- Capsules (optional)
- Compositions (optional)
- Configurations (optional)
- A complete deployable unit (e.g., “Perception Pipeline v2.3.0”)
SDK Changes
New File:/packages/roboticks-device/include/DeploymentManager.hpp
GET /api/v1/organizations/{org}/projects/{project}/deployments/{deployment_id}- Get full deployment detailsPOST /api/v1/organizations/{org}/projects/{project}/deployments/{deployment_id}/status- Report status
3.5 Rollouts (Progressive Deployments)
What Are Rollouts?- Controlled deployment strategies for fleet updates
- Strategies: All at Once, Progressive (10% → 50% → 100%), Canary (5%), Blue/Green, Offline
SDK Changes
New File:/packages/roboticks-device/include/RolloutController.hpp
GET /api/v1/organizations/{org}/projects/{project}/device-rollouts?device_id={dsn}- Get assigned rolloutPOST /api/v1/organizations/{org}/projects/{project}/rollouts/{rollout_id}/device-status- Report progress
- Subscribe:
roboticks/devices/{dsn}/rollout- Real-time rollout assignment - Publish:
roboticks/devices/{dsn}/rollout-progress- Progress updates
- Device downloads offline bundle (ZIP file)
- Bundle includes: capsules, Docker images (tar), manifests, checksums
- Device extracts and installs from local storage
- Useful for air-gapped environments or limited bandwidth
3.6 Commands (Enhanced)
Current State:- Basic command polling via HTTP
- Simple execute/acknowledge flow
SDK Changes
Enhanced File:/packages/roboticks-device/include/CommandExecutor.hpp
INSTALL_DEPLOYMENT- Trigger deployment installationSTOP_DEPLOYMENT- Stop running deploymentCOLLECT_DIAGNOSTICS- Upload logs, system metrics, dmesg, etc.START_TUNNEL- Create reverse tunnelSTOP_TUNNEL- Close reverse tunnel
3.7 Reverse Tunnels (Secure Remote Access)
What Are Reverse Tunnels?- SSH reverse tunnels from edge device to cloud proxy
- Enables secure access to devices behind NAT/firewall
- Use cases: SSH, HTTP/HTTPS, VNC, Jupyter notebooks
Backend Infrastructure
- Deploy SSH jump server (AWS EC2 or Fargate)
- Generate unique subdomain per tunnel:
{tunnel_id}.tunnels.roboticks.io - Use AWS Application Load Balancer for HTTPS termination
- Store tunnel metadata in database
SDK Changes
New File:/packages/roboticks-device/include/ReverseTunnelAgent.hpp
- Device uses same X.509 certificate for SSH (convert to SSH format)
- Jump server validates device certificate against database
- ACL: Only authorized users can access tunnels
GET /api/v1/reverse-tunnels/devices/{dsn}/reverse-tunnels- List device tunnelsPOST /api/v1/reverse-tunnels/devices/{dsn}/reverse-tunnels- Create tunnelDELETE /api/v1/reverse-tunnels/{tunnel_id}- Close tunnel
- Show tunnel status in device overview
- Copy public URL to clipboard
- One-click SSH/HTTP access
3.8 Device Groups (Fleet Organization)
What Are Device Groups?- Logical collections of devices (e.g., “Production Fleet”, “Beta Testers”)
- Target deployments to groups instead of individual devices
- Dynamic membership based on tags or manual assignment
SDK Changes
- No major SDK changes required
- Device reports tags in heartbeat
- Backend assigns device to groups based on tags
- Rollouts target device groups
GET /api/v1/organizations/{org}/projects/{project}/device-groups- List groupsPOST /api/v1/organizations/{org}/projects/{project}/device-groups/{group_id}/devices- Add device to group
4. Implementation Phases
Phase 1: Core Deployment Infrastructure (2-3 weeks)
Priority: HIGH Backend:- ✅ Already complete: Deployments, Capsules, Configurations APIs
- Implement
DeploymentManager(download, install, launch) - Implement
CapsuleManager(native binary handling) - Implement
EnvironmentManager(config fetching and merging) - Add deployment tracking to heartbeat
- Deploy HelloWorldModule as capsule
- Apply environment config to running module
- Verify version tracking in dashboard
Phase 2: Docker Composition Support (2 weeks)
Priority: HIGH SDK:- Implement
CompositionManager(Docker integration) - ECR authentication via IAM role
- Docker Compose parsing and execution
- Container health monitoring
- Create Docker Compose with ROS2 nodes
- Deploy to device via platform
- Verify container logs in dashboard
Phase 3: Progressive Rollouts (2 weeks)
Priority: MEDIUM Backend:- ✅ Already complete: Rollouts API
- Implement
RolloutController - Progressive deployment strategies
- Health check validation
- Automatic rollback on failure
- Create rollout with canary strategy (5% → 100%)
- Verify staged deployment
- Test auto-rollback on failed health check
Phase 4: Reverse Tunnels (1-2 weeks)
Priority: MEDIUM Backend:- Deploy SSH jump server (EC2 or Fargate)
- Configure ALB for HTTPS tunneling
- Implement tunnel authentication
- Implement
ReverseTunnelAgent - SSH tunnel establishment
- Auto-reconnect logic
- Tunnel health monitoring
- Create tunnel to device HTTP service
- Access via public URL
- Verify authentication and ACL
Phase 5: Offline Deployments (1 week)
Priority: LOW Backend:- Generate offline bundle (ZIP with checksums)
- Include Docker images as tar files
- Offline bundle extraction
- Docker image loading from tar
- Manifest-based installation
- Download offline bundle
- Deploy on air-gapped device
- Verify installation without internet
Phase 6: Enhanced Commands (1 week)
Priority: LOW SDK:- Add new command types (diagnostics, deployment control)
- Timeout and retry logic
- Command scheduling
- Send COLLECT_DIAGNOSTICS command
- Verify diagnostic bundle upload
5. Updated HelloWorld Demo
Before (Current State)
After (Enhanced)
Demo Flow (End-to-End)
- Developer builds HelloWorld module (C++)
- Upload capsule to S3 via CLI:
roboticks capsule upload hello-world-module.tar.gz - Build Docker image and push to ECR:
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/hello-world:v1.0.0 - Create environment config via dashboard
- Create deployment package bundling capsule + composition + config
- Create rollout targeting device group
- Device receives rollout assignment
- Device downloads and installs deployment
- Device reports success
- Dashboard shows deployment status
6. API Changes Summary
New SDK → Backend Communication
Deployment APIs:GET /api/v1/organizations/{org}/projects/{project}/deployments/{deployment_id}- Fetch deployment detailsPOST /api/v1/organizations/{org}/projects/{project}/deployments/{deployment_id}/status- Report installation status
GET /api/v1/organizations/{org}/projects/{project}/device-rollouts?device_id={dsn}- Get assigned rolloutPOST /api/v1/organizations/{org}/projects/{project}/rollouts/{rollout_id}/device-status- Report progress
GET /api/v1/organizations/{org}/projects/{project}/environment-variables?device_id={dsn}- Get merged variablesGET /api/v1/organizations/{org}/projects/{project}/environment-configs/{config_id}- Get specific config
GET /api/v1/organizations/{org}/projects/{project}/compositions- List ECR imagesGET /api/v1/organizations/{org}/projects/{project}/compositions/{tag}/manifest- Get Docker Compose YAML
GET /api/v1/reverse-tunnels/devices/{dsn}/reverse-tunnels- List device tunnelsPOST /api/v1/reverse-tunnels/devices/{dsn}/reverse-tunnels- Create tunnelDELETE /api/v1/reverse-tunnels/{tunnel_id}- Close tunnel
New MQTT Topics
Device → Cloud:roboticks/devices/{dsn}/deployment-status- Installation progressroboticks/devices/{dsn}/rollout-progress- Rollout stage updatesroboticks/devices/{dsn}/health-check- Post-deployment healthroboticks/devices/{dsn}/tunnel-status- Tunnel lifecycle events
roboticks/devices/{dsn}/deployment-install- Install deploymentroboticks/devices/{dsn}/rollout- Rollout assignmentroboticks/devices/{dsn}/environment-update- Config changesroboticks/devices/{dsn}/tunnel-create- Create tunnel
7. Security Considerations
Deployment Verification
- Checksums: SHA256 hash verification for all downloads
- Signatures: GPG signatures for capsules (optional)
- TLS: All downloads via HTTPS (S3 presigned URLs)
ECR Authentication
- IAM role-based authentication (no hardcoded credentials)
- Temporary credentials via AWS STS
- Scoped permissions: read-only access to specific ECR repositories
Reverse Tunnel Security
- mTLS authentication using device certificate
- ACL enforcement on jump server
- Rate limiting to prevent abuse
- Audit logging of all tunnel access
Environment Variable Security
- Mark sensitive variables as “SECRET” type
- Encrypt secrets at rest in database
- Never log secret values
- Use AWS Secrets Manager for production secrets
8. Monitoring and Observability
Metrics to Track
- Deployment success/failure rate
- Average deployment time
- Rollout stage progression
- Health check pass/fail rate
- Tunnel connection uptime
- Network bandwidth usage (downloads)
- Disk space usage (/opt/roboticks/)
Logging Enhancements
- Add deployment context to all logs (deployment_id, version)
- Structured logging for deployment events
- Upload installation logs to S3 for debugging
- Real-time log streaming during deployments
Dashboard Widgets
- Deployment status per device
- Rollout progress visualization
- Active tunnel list
- Environment variable diff viewer
- Deployment history timeline
9. Developer Experience Improvements
CLI Tool: roboticks-cli
VSCode Extension
- Syntax highlighting for YAML manifests
- IntelliSense for environment variables
- One-click deployment to test devices
- Real-time log streaming in IDE
Testing Tools
- Local simulator for testing deployments
- Mock cloud API server
- Deployment verification script
10. Migration Path (Existing Users)
Backward Compatibility
- Keep existing module launcher (
run.sh) working - Support both old and new deployment formats
- Graceful deprecation warnings
Migration Steps
- Convert existing compositions to new format
- Create capsule metadata for native modules
- Extract environment configs from YAML files
- Create initial deployment packages
- Test deployments on dev devices
- Roll out to production fleet
11. Success Metrics
Technical Metrics
- ✅ 99% deployment success rate
- ✅ < 5 minutes average deployment time
- ✅ Zero-downtime updates (blue/green)
- ✅ < 1% rollback rate
- ✅ Tunnel uptime > 99.9%
Business Metrics
- ✅ Reduce deployment time from hours to minutes
- ✅ Enable remote debugging via tunnels
- ✅ Support air-gapped deployments
- ✅ Simplify multi-container applications
12. Next Steps
Immediate Actions (Week 1)
- Review this document with team
- Finalize API contracts between backend and SDK
- Set up development environment for SDK changes
- Create feature branches for each component
Sprint Planning
- Sprint 1-2: Phase 1 (Core Deployment Infrastructure)
- Sprint 3-4: Phase 2 (Docker Composition Support)
- Sprint 5-6: Phase 3 (Progressive Rollouts)
- Sprint 7: Phase 4 (Reverse Tunnels)
- Sprint 8: Phase 5-6 (Offline + Enhanced Commands)
Risk Mitigation
- Risk: Docker not available on all devices
- Mitigation: Capsule-only deployments, detect Docker availability
- Risk: Network bandwidth limitations
- Mitigation: Delta updates, compression, offline bundles
- Risk: Device storage constraints
- Mitigation: Automatic cleanup of old versions, configurable retention policy
Appendix A: File Structure After Integration
Appendix B: Example Deployment Manifests
Full Deployment YAML
Conclusion
This integration plan transforms the roboticks-sdk from a simple demo into a production-grade edge deployment platform. By integrating Compositions, Capsules, Configurations, Packages, Rollouts, Environments, Commands, and Reverse Tunnels, we enable:- Flexible Deployments: Mix native binaries and Docker containers
- Progressive Rollouts: Minimize risk with canary and blue/green strategies
- Dynamic Configuration: Update behavior without redeployment
- Remote Access: Debug and inspect devices via secure tunnels
- Offline Operation: Support air-gapped environments
- Zero-Downtime Updates: Graceful shutdowns and health checks