Skip to main content

Lambda Bundling Solution for Stats Tracking

Problem

The backend app layer exceeded Lambda’s 250MB unzipped size limit when bundling all dependencies from requirements.txt.

Solution

Bundle backend code directly into each Lambda function with minimal dependencies, avoiding the need for a shared layer.

Key Optimizations

  1. Minimal Dependencies: Each Lambda only installs sqlalchemy==2.0.23
    • psycopg2-binary comes from existing psycopg2Layer
    • boto3 is provided by Lambda runtime (no need to bundle)
    • Removed FastAPI, uvicorn, redis, influxdb, and other backend-specific deps
  2. Selective Code Copying: Only copy necessary backend modules
    • app/models - Database models
    • app/services - Business logic (StatsUpdaterService)
    • app/db - Database connection
    • app/core - Core utilities
    • Excluded: API routes, middleware, schemas, CLI tools
  3. CDK Bundling: Use Docker-based bundling at deploy time
    code: lambda.Code.fromAsset('lambda/hourly-stats-updater', {
      bundling: {
        image: lambda.Runtime.PYTHON_3_11.bundlingImage,
        command: ['bash', '-c', [
          'pip install -r requirements.txt -t /asset-output',
          'cp index.py /asset-output/',
          'mkdir -p /asset-output/app',
          'cp -r ../../backend/app/models /asset-output/app/',
          'cp -r ../../backend/app/services /asset-output/app/',
          'cp -r ../../backend/app/db /asset-output/app/',
          'cp -r ../../backend/app/core /asset-output/app/',
          'touch /asset-output/app/__init__.py',
          // ... more __init__.py files
        ].join(' && ')],
      },
    })
    

Lambda Functions Deployed

1. MQTT Counter Lambda

  • Trigger: IoT Rule on roboticks/# topics
  • Function: Increment MQTT message counters in real-time
  • Size: ~5MB (standalone, no backend code needed)

2. Hourly Stats Updater Lambda

  • Trigger: EventBridge every hour
  • Function: Query AWS APIs (ECR, S3) and refresh project_stats
  • Size: ~15MB (backend code + SQLAlchemy)
  • Timeout: 5 minutes
  • Memory: 512 MB

3. Daily Snapshot Lambda

  • Trigger: EventBridge daily at 00:00 UTC
  • Function: Create daily_stats snapshot for yesterday
  • Size: ~15MB
  • Timeout: 5 minutes
  • Memory: 512 MB

4. Monthly Reset Lambda

  • Trigger: EventBridge on 1st of month at 00:00 UTC
  • Function: Aggregate daily_statsmonthly_stats, reset counters
  • Size: ~15MB
  • Timeout: 10 minutes
  • Memory: 1024 MB

File Structure

infrastructure/
└── lambda/
    ├── mqtt-counter/
    │   ├── index.py
    │   └── requirements.txt
    ├── hourly-stats-updater/
    │   ├── index.py
    │   └── requirements.txt (only sqlalchemy)
    ├── daily-snapshot/
    │   ├── index.py
    │   └── requirements.txt (only sqlalchemy)
    ├── monthly-reset/
    │   ├── index.py
    │   └── requirements.txt (only sqlalchemy)
    ├── device-heartbeat/
    │   └── index.py
    ├── device-logs/
    │   └── index.py
    └── psycopg2-layer/
        └── python/
            └── psycopg2/

Deployment

Build

cd infrastructure
npm run build

Deploy

cdk deploy
The bundling happens automatically during cdk deploy:
  1. CDK spins up a Docker container with Python 3.11
  2. Installs sqlalchemy from requirements.txt
  3. Copies Lambda handler (index.py)
  4. Copies backend app code from ../../backend/app/
  5. Creates __init__.py files for Python modules
  6. Packages everything into a ZIP file
  7. Uploads to Lambda

Benefits

No Layer Size Issues: Each Lambda is self-contained, ~15MB total ✅ Fast Cold Starts: Minimal dependencies = faster Lambda init ✅ Independent Deployments: Can update Lambdas without affecting others ✅ Type Safety: Uses real backend models and services ✅ DRY Code: Reuses StatsUpdaterService logic

Alternative Approaches (Not Used)

❌ Lambda Layer

  • Problem: Exceeded 250MB limit with full backend deps
  • Would need: Separate layers for each dependency group

❌ ECS Scheduled Tasks

  • Pro: Can use full backend container
  • Con: More complex, requires ECS cluster, slower startup

❌ Lambda Container Images

  • Pro: Up to 10GB size limit
  • Con: Slower cold starts, more complex CI/CD

❌ Standalone Lambda Code

  • Pro: Minimal size
  • Con: Code duplication, no type safety, hard to maintain

Monitoring

Check Lambda execution logs:
aws logs tail /aws/lambda/RoboticksStack-HourlyStatsLambda-XXXXX --follow
aws logs tail /aws/lambda/RoboticksStack-DailySnapshotLambda-XXXXX --follow
aws logs tail /aws/lambda/RoboticksStack-MonthlyResetLambda-XXXXX --follow
Check CloudWatch metrics:
  • AWS/Lambda/Invocations
  • AWS/Lambda/Errors
  • AWS/Lambda/Duration
  • AWS/Lambda/ConcurrentExecutions

Troubleshooting

”Unable to import module”

  • Check that app/__init__.py files exist
  • Verify backend code copied correctly: aws lambda get-function --function-name XXX

”No module named ‘sqlalchemy’”

  • Check requirements.txt has sqlalchemy==2.0.23
  • Verify Docker bundling completed: look for “Bundling asset” in deploy logs

Size still too large

  • Remove unnecessary imports in StatsUpdaterService
  • Exclude unused backend modules
  • Use Lambda layers for common dependencies

Future Improvements

  1. Optimize Bundle Size: Only copy files actually used by StatsUpdaterService
  2. Caching: Cache backend app layer between builds
  3. Compression: Use custom Docker image with pre-installed deps
  4. Monitoring: Add X-Ray tracing for performance insights