Skip to main content

AWS Compute Services

AWS provides multiple compute service options, each optimized for different workload patterns. Selecting the appropriate compute service impacts application performance, operational complexity, cost, and scalability. This guide covers EC2 (virtual machines), ECS/Fargate (container orchestration), and Lambda (serverless functions), along with decision frameworks and production best practices.

The compute layer forms the foundation of your application infrastructure. Understanding the tradeoffs between EC2, ECS, and Lambda enables you to match compute resources to workload requirements while optimizing for cost, operational overhead, and scalability.

Right-Sizing Strategy

Start with conservative sizing and scale up based on metrics. Over-provisioning wastes money; under-provisioning creates performance issues. Use CloudWatch metrics and AWS Compute Optimizer recommendations to continuously right-size instances.


Compute Service Comparison

AWS offers three primary compute paradigms, each with distinct characteristics:

ServiceControl LevelScaling ModelPricing ModelBest For
EC2Full OS controlManual/Auto-scaling groupsPay per hour/secondLegacy apps, specific OS requirements, persistent workloads
ECS/FargateContainer-levelService-based auto-scalingPay for tasks (Fargate) or instances (EC2)Microservices, containerized apps, moderate control needs
LambdaFunction-levelAutomatic per-requestPay per invocation + durationEvent-driven, variable traffic, stateless operations

When to Use Each Service

Use EC2 when:

  • You need full operating system control or specific kernel configurations
  • Running legacy applications that aren't containerized or can't be easily adapted
  • Workloads require high, consistent compute (compute-intensive batch jobs, HPC)
  • You need persistent local storage or specific hardware (GPU instances, Nitro-based instances)
  • Licensing requires dedicated instances or hosts (Windows, Oracle, SAP)

Use ECS/Fargate when:

  • Your application is containerized (or can be containerized)
  • You want orchestration without managing Kubernetes complexity
  • You need service discovery, load balancing, and health checks built-in
  • Workloads have predictable resource requirements
  • You want AWS-native integration (CloudWatch, IAM, VPC) without Kubernetes overhead

Use Lambda when:

  • Workloads are event-driven (API requests, S3 uploads, message queue processing)
  • Traffic is highly variable or sporadic (unpredictable spikes)
  • Functions complete in under 15 minutes (Lambda's maximum execution time)
  • You want zero server management and automatic scaling
  • Cost optimization for low-utilization workloads (pay only when code runs)

This decision tree guides you through compute service selection based on application characteristics. The choice isn't always binary - some architectures use multiple compute types (e.g., Lambda for API Gateway integration, ECS for core services, EC2 for databases).


EC2 (Elastic Compute Cloud)

EC2 provides resizable virtual machines with full operating system control. EC2 instances offer flexibility and control at the cost of higher operational overhead compared to managed services.

Instance Types and Families

AWS offers hundreds of instance types organized into families based on workload optimization:

General Purpose (T, M family):

  • T3/T3a/T4g (burstable): Variable CPU performance with burst credits. Ideal for development, web servers, small databases with variable load. Cost-effective but performance depends on credit balance.
  • M6i/M6a/M7g (balanced): Balanced compute, memory, networking. Default choice for most applications (app servers, microservices, backend APIs).

Compute Optimized (C family):

  • C6i/C6a/C7g: High-performance processors, optimal for compute-intensive workloads (batch processing, scientific modeling, high-traffic web servers, gaming servers).

Memory Optimized (R, X family):

  • R6i/R6a/R7g: High memory-to-CPU ratio. Ideal for in-memory databases, caching layers, big data analytics.
  • X2idn/X2iedn: Extremely high memory for SAP HANA, large in-memory databases.

Storage Optimized (I, D family):

  • I4i/I4g: High-performance local NVMe storage. Ideal for NoSQL databases, data warehousing, Elasticsearch clusters.

Accelerated Computing (P, G, Inf family):

  • P4/P5 (GPU): Machine learning training, HPC simulations.
  • G5/G5g (GPU): Graphics rendering, ML inference, video encoding.
  • Inf2 (Inferentia): Cost-effective ML inference.

Graviton (T4g, M7g, C7g, R7g):

  • AWS-designed ARM processors offering better price-performance than x86 equivalents (up to 40% better price-performance). Requires ARM-compatible software but supported by most modern runtimes (Java 11+, Node.js, Python, Go).

Choosing Instance Types

Start with M-family instances (M6i or M7g) for general workloads. They provide balanced CPU, memory, and networking suitable for most applications.

Profile before optimizing. Run your workload on a baseline instance (M6i.xlarge) and monitor CloudWatch metrics:

  • High CPU utilization (>70%) → Consider C-family (compute-optimized)
  • High memory pressure → Consider R-family (memory-optimized)
  • I/O bottlenecks → Consider I-family (storage-optimized) or upgrade network performance

Use Graviton for cost savings. If your runtime supports ARM (Java 11+, Python 3.8+, Node.js 12+, Go, Rust), migrate to Graviton instances (M7g, C7g, R7g) for 20-40% cost reduction with equivalent or better performance.

Avoid over-provisioning. Use AWS Compute Optimizer recommendations. An idle 16xlarge instance costs the same as running it at full capacity - but delivers zero value.

// Spring Boot application detecting instance metadata
// Useful for environment-aware configuration
import org.springframework.web.client.RestTemplate;

public class InstanceMetadata {
private static final String METADATA_URL = "http://169.254.169.254/latest/meta-data";

public String getInstanceId() {
RestTemplate restTemplate = new RestTemplate();
return restTemplate.getForObject(METADATA_URL + "/instance-id", String.class);
}

public String getInstanceType() {
RestTemplate restTemplate = new RestTemplate();
return restTemplate.getForObject(METADATA_URL + "/instance-type", String.class);
}

public String getAvailabilityZone() {
RestTemplate restTemplate = new RestTemplate();
return restTemplate.getForObject(METADATA_URL + "/placement/availability-zone", String.class);
}
}

Note: Always use IMDSv2 (Instance Metadata Service v2) which requires a session token, preventing SSRF attacks. Configure your AMIs and launch templates to require IMDSv2.

AMI Management

Amazon Machine Images (AMIs) provide the OS and initial software configuration for EC2 instances. AMI strategy impacts security, consistency, and deployment speed.

Use golden AMIs for consistency. Create custom AMIs with your base configuration (OS patches, monitoring agents, security tools, runtime dependencies) pre-installed. This ensures identical configuration across instances and reduces boot time (no need to install software on every launch).

Automate AMI builds. Use HashiCorp Packer or EC2 Image Builder to create AMIs programmatically from a version-controlled configuration. This provides reproducible, auditable image builds.

Patch AMIs regularly. Rebuild AMIs monthly with latest OS patches. Old AMIs accumulate vulnerabilities. Tag AMIs with build date and retire AMIs older than 90 days.

Separate base AMIs from application deployments. AMIs should contain the base OS and common tooling. Deploy application code separately (via CodeDeploy, user data scripts, or container images) to enable faster deployments and avoid rebuilding AMIs for every code change.

# Example Packer template for golden AMI
# Creates Ubuntu AMI with Java 21, monitoring agents, security hardening
source "amazon-ebs" "ubuntu-java" {
ami_name = "golden-ubuntu-java21-{{timestamp}}"
instance_type = "t3.medium"
region = "us-east-1"
source_ami_filter {
filters = {
name = "ubuntu/images/*ubuntu-jammy-22.04-amd64-server-*"
root-device-type = "ebs"
virtualization-type = "hvm"
}
most_recent = true
owners = ["099720109477"] # Canonical
}
ssh_username = "ubuntu"

tags = {
Name = "Golden Ubuntu Java 21"
Environment = "Production"
BuildDate = "{{timestamp}}"
}
}

build {
sources = ["source.amazon-ebs.ubuntu-java"]

provisioner "shell" {
inline = [
"sudo apt-get update",
"sudo apt-get upgrade -y",
"sudo apt-get install -y openjdk-21-jdk-headless",
"sudo apt-get install -y cloudwatch-agent",
# Install security tools, configure hardening
]
}
}

Auto Scaling Groups

Auto Scaling Groups (ASGs) manage instance lifecycle, automatically replacing unhealthy instances and scaling capacity based on demand.

An ASG defines:

  • Launch template: Instance configuration (AMI, instance type, security groups, IAM role, user data)
  • Desired/Min/Max capacity: Target instance count and scaling boundaries
  • Health checks: Determine instance health (EC2 status checks, ELB health checks, custom health checks)
  • Scaling policies: Rules for adding/removing instances based on metrics

Launch Templates vs Launch Configurations

Always use Launch Templates (not Launch Configurations). Launch templates support:

  • Versioning (manage multiple template versions, roll back changes)
  • Multiple instance types (useful for Spot instances)
  • T2/T3 unlimited mode
  • Latest features (Launch Configurations are deprecated)
// Example launch template (JSON representation)
{
"LaunchTemplateName": "app-server-template",
"VersionDescription": "Spring Boot application server",
"LaunchTemplateData": {
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "m6i.large",
"IamInstanceProfile": {
"Arn": "arn:aws:iam::123456789012:instance-profile/app-server-role"
},
"SecurityGroupIds": ["sg-0123456789abcdef0"],
"UserData": "base64-encoded-user-data-script",
"BlockDeviceMappings": [{
"DeviceName": "/dev/xvda",
"Ebs": {
"VolumeSize": 50,
"VolumeType": "gp3",
"Encrypted": true
}
}],
"MetadataOptions": {
"HttpTokens": "required", // Enforce IMDSv2
"HttpPutResponseHopLimit": 1
},
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [
{"Key": "Name", "Value": "app-server"},
{"Key": "Environment", "Value": "production"}
]
}]
}
}

Scaling Policies

Target Tracking Scaling (recommended): Define a target metric value (e.g., "maintain 60% CPU utilization"). ASG automatically adds/removes instances to maintain the target. Simple and effective for most workloads.

{
"TargetTrackingScalingPolicyConfiguration": {
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 60.0
}
}

Step Scaling: Add/remove specific numbers of instances based on CloudWatch alarm thresholds. More control but requires careful tuning. Use when target tracking doesn't fit your scaling pattern.

Scheduled Scaling: Scale based on predictable traffic patterns (e.g., scale up Monday-Friday 8am-6pm, scale down nights/weekends). Useful for workloads with known traffic patterns.

Predictive Scaling: Machine learning-based scaling using historical CloudWatch data to forecast demand. Useful for cyclical traffic patterns (daily/weekly peaks).

Health Checks

ASGs replace unhealthy instances automatically. Configure health check types:

  • EC2 status checks (default): Monitors hypervisor and network reachability. Doesn't detect application failures.
  • ELB health checks: Probes application endpoints (HTTP /health). Detects application-level failures. Recommended for ASGs behind load balancers.
  • Custom health checks: Your application calls SetInstanceHealth API to mark instances healthy/unhealthy based on application logic.

Set health check grace period to allow instances time to boot and become healthy before health checks start (typically 300-600 seconds for Spring Boot applications).

This sequence shows the instance lifecycle in an ASG with ELB health checks. The grace period prevents premature termination during application startup. Once healthy, the instance receives traffic until health checks fail, triggering automatic replacement.

User Data Scripts

User data scripts run once when an instance first launches. Use user data for:

  • Installing/updating software packages
  • Configuring application settings
  • Pulling application code from S3/Git
  • Registering with service discovery

Keep user data minimal. User data delays instance readiness. Pre-install software in AMIs; use user data only for instance-specific configuration (environment variables, configuration file downloads).

Make user data idempotent. If the instance reboots, user data shouldn't fail or create duplicate resources. Check if resources exist before creating them.

#!/bin/bash
# User data script for Spring Boot application
# Runs on instance first launch

set -e # Exit on error

# Update system packages
apt-get update && apt-get upgrade -y

# Download application JAR from S3
aws s3 cp s3://my-app-bucket/releases/app-1.2.3.jar /opt/app/app.jar

# Create systemd service for auto-start
cat > /etc/systemd/system/app.service <<EOF
[Unit]
Description=Spring Boot Application
After=network.target

[Service]
User=appuser
ExecStart=/usr/bin/java -jar /opt/app/app.jar
Restart=always
Environment="SPRING_PROFILES_ACTIVE=production"

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
systemctl daemon-reload
systemctl enable app.service
systemctl start app.service

# Signal CloudFormation or ASG that instance is ready (if using wait conditions)

Instance Metadata Service (IMDSv2)

EC2 instances access metadata via the Instance Metadata Service (IMDS) at http://169.254.169.254/latest/meta-data/. Metadata includes instance ID, instance type, IAM role credentials, and network configuration.

IMDSv2 requires a session token before accessing metadata, preventing Server-Side Request Forgery (SSRF) attacks where an attacker tricks your application into making requests to the metadata endpoint.

Always enforce IMDSv2 in launch templates:

{
"MetadataOptions": {
"HttpTokens": "required", // Require IMDSv2 tokens
"HttpPutResponseHopLimit": 1, // Limit token usage to instance (prevents Docker forwarding)
"HttpEndpoint": "enabled"
}
}

Accessing IMDSv2 from code:

// Java example using IMDSv2
import software.amazon.awssdk.regions.internal.util.EC2MetadataUtils;

public class MetadataAccess {
public String getInstanceIdentityDocument() {
// AWS SDK automatically handles IMDSv2 token retrieval
return EC2MetadataUtils.getInstanceIdentityDocument().getInstanceId();
}

public String getInstanceRegion() {
return EC2MetadataUtils.getInstanceIdentityDocument().getRegion();
}
}

See AWS IAM documentation for details on using instance profiles to provide credentials to applications running on EC2.

Spot Instances and Savings Plans

Spot Instances use spare EC2 capacity at up to 90% discount compared to On-Demand pricing. AWS can reclaim Spot instances with 2-minute warning when capacity is needed.

Use Spot instances for:

  • Fault-tolerant batch processing
  • Data analysis jobs
  • CI/CD build agents
  • Stateless web servers (in ASGs with On-Demand base capacity)

Do not use Spot for:

  • Databases or stateful applications
  • Long-running jobs that can't tolerate interruption
  • Real-time processing with strict SLAs

Spot interruption handling:

// Monitor Spot instance termination notices via instance metadata
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class SpotInterruptionHandler implements Runnable {
private static final String SPOT_TERMINATION_URL =
"http://169.254.169.254/latest/meta-data/spot/instance-action";

@Override
public void run() {
while (!Thread.interrupted()) {
try {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(SPOT_TERMINATION_URL))
.GET()
.build();

HttpResponse<String> response = client.send(request,
HttpResponse.BodyHandlers.ofString());

if (response.statusCode() == 200) {
// Spot instance will be terminated in 2 minutes
handleGracefulShutdown();
break;
}

Thread.sleep(5000); // Check every 5 seconds
} catch (Exception e) {
// 404 means no termination notice (normal state)
}
}
}

private void handleGracefulShutdown() {
// Drain connections, finish in-flight requests, deregister from load balancer
log.warn("Spot instance termination notice received. Initiating graceful shutdown.");
// ... shutdown logic ...
}
}

Savings Plans offer discounted pricing (up to 72% for 3-year Compute Savings Plans) in exchange for committing to consistent compute usage (measured in $/hour). Unlike Reserved Instances, Savings Plans automatically apply to any compute matching the commitment (EC2, Fargate, Lambda), providing flexibility.

Recommendation: Use Savings Plans for baseline capacity (commit to your minimum sustained usage) and Spot/On-Demand for variable capacity.


ECS/Fargate (Elastic Container Service)

ECS orchestrates Docker containers across EC2 instances or AWS Fargate (serverless container runtime). ECS provides simpler orchestration than Kubernetes while offering AWS-native integration.

For Kubernetes-based orchestration, see EKS documentation and Kubernetes guidelines.

ECS Architecture

ECS Cluster: Logical grouping of tasks and services. A cluster can run tasks on EC2 instances (you manage the instances) or Fargate (AWS manages the infrastructure).

Task Definition: Blueprint defining container configurations (Docker image, CPU/memory, environment variables, IAM role, networking, volumes). Task definitions are immutable and versioned.

Task: Running instance of a task definition. A task contains one or more containers that run together on the same host (similar to a Kubernetes pod).

Service: Maintains a desired number of task instances, integrated with load balancers, auto-scaling, and rolling deployments. Services ensure task count remains stable (replacing failed tasks automatically).

This diagram shows a mixed cluster with Fargate tasks (API service behind ALB) and EC2-based tasks (worker service processing SQS messages). Fargate eliminates instance management for the API; EC2 launch type provides more control for the worker.

Task Definitions

Task definitions specify container configurations. Each task definition version is immutable - updates create new versions.

{
"family": "spring-boot-api",
"networkMode": "awsvpc", // Required for Fargate, recommended for EC2
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024", // 1 vCPU (Fargate CPU units)
"memory": "2048", // 2 GB
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/app-task-role",
"containerDefinitions": [
{
"name": "api",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/api:1.2.3",
"cpu": 1024,
"memory": 2048,
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "SPRING_PROFILES_ACTIVE", "value": "production"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/spring-boot-api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "api"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/actuator/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}

Key task definition concepts:

  • executionRoleArn: IAM role ECS agent uses to pull images from ECR, write logs to CloudWatch, retrieve secrets. See IAM roles.
  • taskRoleArn: IAM role your application code uses to access AWS services (S3, DynamoDB, SQS). Follows least privilege principle.
  • networkMode: awsvpc: Each task gets its own elastic network interface (ENI) with a private IP. Required for Fargate, recommended for EC2 (provides task-level security groups).
  • secrets: Inject secrets from Secrets Manager or SSM Parameter Store without hardcoding in environment variables. See secrets management.
  • healthCheck: Container-level health check (different from ELB health checks). ECS uses this to detect failed containers and restart tasks.

Fargate vs EC2 Launch Type

Fargate (serverless containers):

  • AWS manages infrastructure (no EC2 instances to configure, patch, or scale)
  • You define CPU/memory per task; AWS provisions capacity
  • Pay per task runtime (per-second billing based on vCPU and memory)
  • Simpler operations, faster time to market
  • Slightly higher cost per vCPU-hour than EC2

EC2 launch type:

  • You manage EC2 instances (instance types, AMIs, patching, auto-scaling)
  • More control over instance configuration (instance types, local storage, placement)
  • Lower cost for sustained workloads (Savings Plans, Reserved Instances)
  • Supports Docker volumes, host port mappings, privileged containers

Decision matrix:

FactorFargateEC2
Operational simplicityServerless, no instance managementManage instances, AMIs, patching
Startup time~30s task launchDepends on ASG scaling
Cost (variable workload)Pay only for task runtimePay for idle instances
Cost (sustained workload)Higher per-vCPU costLower with Savings Plans
Instance controlNo instance accessFull OS control
GPU/specialized hardwareNot supportedSupported

Recommendation: Start with Fargate unless you need EC2-specific features (GPU, Docker volumes, specific instance types). Fargate's operational simplicity outweighs the cost premium for most workloads.

Service Configuration

ECS Services maintain a desired task count, integrate with load balancers, and manage deployments.

{
"serviceName": "api-service",
"taskDefinition": "spring-boot-api:12",
"cluster": "production-cluster",
"desiredCount": 3,
"launchType": "FARGATE",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": ["subnet-abc123", "subnet-def456"],
"securityGroups": ["sg-0123456789abcdef0"],
"assignPublicIp": "DISABLED" // Use private subnets
}
},
"loadBalancers": [
{
"targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/api-tg/abc123",
"containerName": "api",
"containerPort": 8080
}
],
"healthCheckGracePeriodSeconds": 60,
"deploymentConfiguration": {
"maximumPercent": 200, // Allow double capacity during deployment
"minimumHealthyPercent": 100 // Maintain full capacity during deployment
},
"deploymentController": {
"type": "ECS" // or "CODE_DEPLOY" for blue-green deployments
}
}

Deployment strategies:

Rolling update (default): ECS stops old tasks and starts new tasks incrementally. maximumPercent: 200, minimumHealthyPercent: 100 means ECS starts 3 new tasks (200% of 3 = 6 total), waits for them to become healthy, then stops 3 old tasks. Zero downtime, gradual rollout.

Blue-green deployment (CodeDeploy integration): Deploy to a new task set (green), validate, then shift traffic from old task set (blue) to green. Enables instant rollback. See deployment strategies.

Service Discovery

ECS Service Discovery integrates with AWS Cloud Map to register tasks with DNS, enabling service-to-service communication without hard-coded IPs.

When you create a service with service discovery enabled:

  1. ECS registers each task's private IP with Cloud Map
  2. Cloud Map creates a Route 53 private hosted zone
  3. Other services query DNS (e.g., api.production.local) to discover task IPs
  4. DNS returns A records for all healthy tasks (automatic load balancing via DNS)

Service discovery patterns are covered in microservices architecture.

Capacity Providers

Capacity Providers manage the infrastructure for ECS clusters. Fargate has built-in capacity providers (FARGATE and FARGATE_SPOT). For EC2 launch type, capacity providers manage auto-scaling groups.

Fargate Spot: Run Fargate tasks on spare capacity at up to 70% discount. AWS can interrupt Spot tasks with 2-minute notice. Use for fault-tolerant batch jobs, development environments, and stateless workers.

{
"capacityProviders": ["FARGATE", "FARGATE_SPOT"],
"defaultCapacityProviderStrategy": [
{
"capacityProvider": "FARGATE",
"weight": 1,
"base": 2 // Run at least 2 tasks on regular Fargate
},
{
"capacityProvider": "FARGATE_SPOT",
"weight": 4 // Run 80% of additional tasks on Spot (4/(1+4) = 80%)
}
]
}

This strategy runs 2 base tasks on Fargate (guaranteed availability) and 80% of additional tasks on Fargate Spot (cost optimization).


Lambda (Serverless Functions)

AWS Lambda runs code in response to events without provisioning servers. Lambda automatically scales from zero to thousands of concurrent executions and charges only for compute time consumed.

Lambda fundamentally changes operational models - no server management, automatic scaling, sub-second billing. However, Lambda introduces constraints (execution time limits, cold starts, statelessness) that make it unsuitable for certain workloads.

Lambda Function Anatomy

A Lambda function consists of:

  • Handler function: Entry point that AWS invokes when the function executes
  • Runtime: Execution environment (Java 21, Node.js 20, Python 3.12, Go 1.x, .NET 8, custom runtimes)
  • Memory allocation: 128 MB to 10 GB (CPU scales proportionally with memory)
  • Timeout: Maximum execution time (1 second to 15 minutes)
  • Execution role: IAM role granting permissions to AWS services
  • Environment variables: Configuration passed to function at runtime
  • Layers: Shared code/libraries used by multiple functions
// Java Lambda handler for processing API Gateway requests
package com.example.lambda;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyRequestEvent;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyResponseEvent;
import com.fasterxml.jackson.databind.ObjectMapper;

public class ApiHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {

private final ObjectMapper objectMapper = new ObjectMapper();
private final UserService userService = new UserService(); // Initialized outside handler (reused across invocations)

@Override
public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent request, Context context) {
context.getLogger().log("Processing request: " + request.getPath());

try {
String userId = request.getPathParameters().get("id");
User user = userService.getUser(userId);

return new APIGatewayProxyResponseEvent()
.withStatusCode(200)
.withBody(objectMapper.writeValueAsString(user))
.withHeaders(Map.of("Content-Type", "application/json"));
} catch (Exception e) {
context.getLogger().log("Error: " + e.getMessage());
return new APIGatewayProxyResponseEvent()
.withStatusCode(500)
.withBody("{\"error\": \"Internal server error\"}");
}
}
}

Key concepts:

  • Initialize once, reuse: Create expensive resources (DB connections, HTTP clients) outside the handler. Lambda reuses execution environments for subsequent invocations, avoiding repeated initialization.
  • Context object: Provides request ID, remaining execution time, CloudWatch log stream. Use for logging and timeout awareness.
  • Stateless handlers: Lambda functions are stateless. Store state in external services (DynamoDB, S3, ElastiCache).

Cold Starts and Optimization

Cold start occurs when Lambda creates a new execution environment (first invocation, scaling up, or after inactivity). Cold starts add latency:

  • Java: 1-3 seconds (JVM initialization is slow)
  • Node.js: 100-300ms
  • Python: 50-200ms

Minimizing cold starts:

1. Choose faster runtimes for latency-sensitive functions. Node.js, Python, and Go have faster cold starts than Java or .NET. If using Java, consider GraalVM native images (via custom runtime) which reduce cold start to ~100ms.

2. Reduce deployment package size. Larger packages increase cold start time. Remove unused dependencies, use Lambda layers for shared code, minimize JAR/ZIP size.

// build.gradle — minimize JAR size by excluding unused Spring Boot starters
implementation 'org.springframework.cloud:spring-cloud-function-adapter-aws'

// Exclude embedded Tomcat (Lambda has its own runtime)
implementation('org.springframework.boot:spring-boot-starter') {
exclude group: 'org.springframework.boot', module: 'spring-boot-starter-tomcat'
}

3. Use provisioned concurrency for latency-sensitive functions (APIs, user-facing workflows). Provisioned concurrency keeps execution environments warm, eliminating cold starts. Trade-off: you pay for provisioned capacity even when idle.

{
"FunctionName": "api-handler",
"ProvisionedConcurrentExecutions": 10 // Keep 10 environments warm
}

4. Increase memory allocation. Higher memory allocations provide proportionally more CPU, speeding up initialization. Test different memory settings - sometimes 1024 MB is faster and cheaper than 512 MB (due to reduced execution time).

5. Lazy-load dependencies. Initialize resources only when needed, not in global scope. For infrequently-called code paths, this reduces cold start time.

Memory and Timeout Configuration

Lambda charges based on GB-seconds (memory × duration). Higher memory = more CPU = faster execution but higher per-second cost.

Right-size memory allocation:

  1. Start with 1024 MB (balanced CPU/memory)
  2. Monitor CloudWatch metrics: Duration, Max Memory Used
  3. If Max Memory Used approaches configured memory, increase allocation
  4. If execution time is high but memory usage is low, increase memory for more CPU
  5. Use AWS Lambda Power Tuning tool to find optimal memory/cost balance
// TypeScript Lambda handler example
import { APIGatewayProxyEvent, APIGatewayProxyResult, Context } from 'aws-lambda';
import { DynamoDBClient, GetItemCommand } from '@aws-sdk/client-dynamodb';

// Initialize client outside handler (reused across invocations)
const dynamodb = new DynamoDBClient({ region: process.env.AWS_REGION });

export const handler = async (
event: APIGatewayProxyEvent,
context: Context
): Promise<APIGatewayProxyResult> => {

// Check remaining time to avoid timeout
if (context.getRemainingTimeInMillis() < 1000) {
return { statusCode: 503, body: 'Insufficient time to process request' };
}

const userId = event.pathParameters?.id;

const result = await dynamodb.send(new GetItemCommand({
TableName: process.env.TABLE_NAME,
Key: { userId: { S: userId } }
}));

return {
statusCode: 200,
body: JSON.stringify(result.Item),
headers: { 'Content-Type': 'application/json' }
};
};

Timeout configuration: Set timeout slightly higher than expected maximum execution time. Too short = functions timeout unnecessarily; too long = runaway functions consume resources and incur costs. Monitor Duration metric and set timeout at p99 duration + 20%.

Concurrency and Throttling

Concurrency is the number of function instances executing simultaneously. Lambda scales automatically but has limits:

  • Account-level concurrent execution limit: 1000 (default, can be increased)
  • Function-level reserved concurrency: Optional limit on specific functions
  • Burst concurrency: 500-3000 (region-dependent) new instances per minute

When concurrent executions exceed limits, Lambda throttles new invocations (returns 429 Too Many Requests for synchronous invocations, retries asynchronous invocations).

Reserved concurrency guarantees capacity for critical functions while preventing a single function from consuming all account concurrency.

{
"FunctionName": "critical-payment-processor",
"ReservedConcurrentExecutions": 100 // Reserve 100 of account's 1000 concurrency
}

Provisioned concurrency (mentioned earlier) keeps execution environments initialized, reducing cold starts. Unlike reserved concurrency (which only reserves capacity), provisioned concurrency keeps environments warm.

Event Sources

Lambda functions respond to events from various sources:

Synchronous invocation (caller waits for response):

  • API Gateway (REST/HTTP APIs)
  • Application Load Balancer
  • Direct invocation via SDK

Asynchronous invocation (Lambda queues the event, returns immediately):

  • S3 object creation events
  • SNS messages
  • EventBridge events
  • Asynchronous SDK invocations

Polling event sources (Lambda polls for new records):

  • SQS queues (Lambda polls queue, invokes function with batches of messages)
  • Kinesis streams
  • DynamoDB streams
// Processing SQS messages in Lambda (Java)
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.SQSEvent;

public class SqsMessageProcessor implements RequestHandler<SQSEvent, Void> {

@Override
public Void handleRequest(SQSEvent event, Context context) {
for (SQSEvent.SQSMessage message : event.getRecords()) {
processMessage(message.getBody());
}
return null;
}

private void processMessage(String messageBody) {
// Process message
// Throw exception to fail the message (Lambda returns it to queue for retry)
}
}

Lambda automatically handles SQS message deletion after successful processing. If the function throws an exception, Lambda returns messages to the queue for retry (respecting queue's redrive policy and dead-letter queue configuration). See messaging documentation for patterns.

Error Handling and Retries

Synchronous invocations: Errors return to caller immediately. Caller decides whether to retry.

Asynchronous invocations: Lambda retries twice (with exponential backoff) if the function fails. After retries exhausted, Lambda sends the event to a dead-letter queue (DLQ) or event destination (SNS topic, SQS queue, EventBridge, Lambda function).

{
"FunctionName": "async-processor",
"DeadLetterConfig": {
"TargetArn": "arn:aws:sqs:us-east-1:123456789012:failed-events-dlq"
},
"MaximumRetryAttempts": 1 // Reduce from default 2 retries
}

Event source mapping (SQS/Kinesis): Lambda retries the entire batch if any message fails. Configure:

  • MaximumRetryAttempts: Number of retries before giving up
  • BisectBatchOnFunctionError: Split batch in half on failure to isolate poison pill messages
  • OnFailure destination: Send failed batches to SQS/SNS

Recommendation: Use dead-letter queues for async invocations and implement idempotent handlers (safe to retry). See resilience patterns for idempotency strategies.

Lambda Layers

Lambda layers package libraries and dependencies separately from function code, enabling:

  • Sharing common code across multiple functions
  • Reducing deployment package size
  • Separating business logic from dependencies

A layer is a ZIP archive containing libraries, custom runtimes, or configuration files. Functions can reference up to 5 layers. Layers are versioned and immutable.

# Create a layer with shared dependencies
mkdir -p layer/nodejs/node_modules
cd layer/nodejs
npm install aws-sdk axios lodash
cd ../..
zip -r layer.zip layer/

# Publish layer
aws lambda publish-layer-version \
--layer-name shared-dependencies \
--description "Common Node.js dependencies" \
--zip-file fileb://layer.zip \
--compatible-runtimes nodejs20.x

Reference layer in function:

{
"FunctionName": "my-function",
"Layers": [
"arn:aws:lambda:us-east-1:123456789012:layer:shared-dependencies:1"
]
}

Use cases:

  • Shared utilities (logging, error handling, data validation)
  • Large dependencies (ML models, binaries)
  • Custom runtimes (GraalVM native images, COBOL runtime)

Right-Sizing and Cost Optimization

Compute is often the largest cloud cost. Right-sizing reduces waste while maintaining performance.

Compute Optimizer

AWS Compute Optimizer analyzes CloudWatch metrics and recommends optimal instance types (EC2), ECS task sizes, and Lambda memory configurations. Optimizer considers:

  • CPU utilization
  • Memory utilization
  • Network throughput
  • Disk I/O

Access Compute Optimizer recommendations via AWS Console or CLI:

aws compute-optimizer get-ec2-instance-recommendations \
--instance-arns arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0

Implement recommendations gradually. Test performance after resizing. Monitor for 1-2 weeks before proceeding to the next instance.

Auto-Scaling Best Practices

For EC2/ECS:

  • Use target tracking scaling policies (maintain target metric) instead of manual step scaling
  • Set reasonable scaling boundaries (min/max) to prevent runaway costs
  • Configure scale-in/scale-out cooldowns to avoid flapping
  • Monitor scaling activities - frequent scaling suggests under-provisioned baseline capacity

For Lambda:

  • Lambda scales automatically to demand (no configuration required)
  • Set reserved concurrency to prevent runaway costs from DDoS or bugs
  • Use provisioned concurrency only for latency-sensitive, high-traffic functions

Cost Allocation Tags

Tag compute resources for cost visibility:

{
"Tags": [
{"Key": "Environment", "Value": "production"},
{"Key": "Application", "Value": "payment-api"},
{"Key": "Team", "Value": "platform-engineering"},
{"Key": "CostCenter", "Value": "engineering-ops"}
]
}

Enable cost allocation tags in AWS Billing to track costs by team, application, or environment. See cost optimization.


Common Compute Anti-Patterns

Avoid these mistakes that increase cost, reduce availability, or create operational complexity:

Oversized instances: Provisioning m5.4xlarge when m5.large suffices wastes money. Start small, scale based on metrics.

Not using auto-scaling: Manually managing capacity eliminates elasticity. Use ASGs for EC2, service auto-scaling for ECS, and let Lambda scale automatically.

Ignoring cold starts: Deploying Java Lambda functions without addressing cold starts creates poor user experience. Use Node.js/Python for latency-sensitive APIs or provision concurrency for Java.

Running databases on Lambda: Lambda's 15-minute timeout and statelessness make it unsuitable for long-running operations or stateful workloads. Use RDS, Aurora, or DynamoDB for databases.

Public subnets for compute: Placing EC2/ECS tasks in public subnets increases attack surface. Use private subnets with load balancers in public subnets. See networking patterns.

Not enforcing IMDSv2: Using IMDSv1 exposes instances to SSRF attacks. Always enforce IMDSv2 in launch templates.

Synchronous Lambda invocations for long tasks: API Gateway → Lambda has a 30-second timeout. For longer operations, invoke Lambda asynchronously and return 202 Accepted immediately, then notify via SNS/EventBridge when complete.

Mixing compute types unnecessarily: Using EC2, ECS, and Lambda in the same application without clear rationale creates operational complexity. Standardize on 1-2 compute types unless specific workload characteristics require diversity.

No health checks: Running instances without health checks prevents automatic recovery from failures. Configure ELB health checks for ASGs and container health checks for ECS tasks.

Hardcoding configuration: Embedding database URLs, API keys, or environment-specific configuration in AMIs or container images makes them environment-specific. Use environment variables, Secrets Manager, or SSM Parameter Store for configuration. See secrets management.


Further Reading

AWS compute documentation is extensive. Deepen your knowledge with these resources:

For container images and Dockerfile optimization, see Docker guidelines. For Kubernetes-based orchestration, see Kubernetes documentation and EKS-specific patterns. For Infrastructure as Code, see Terraform best practices and IaC on AWS. For load balancer integration, see networking documentation. For event-driven Lambda patterns, see messaging architecture and event-driven architecture.