Skip to main content

Infrastructure as Code on AWS

Infrastructure as Code (IaC) treats infrastructure configuration as versioned, peer-reviewed code rather than manual console operations. On AWS, this means defining VPCs, EC2 instances, RDS databases, IAM roles, and all other resources declaratively in configuration files. Changes flow through version control, code review, automated testing, and CI/CD pipelines - the same rigorous process used for application code.

The fundamental advantage of IaC is reproducibility: create identical environments on demand, recover from disasters by re-running infrastructure code, and ensure consistency across development, staging, and production. Manual infrastructure changes (ClickOps) create drift - the live environment diverges from documentation, making debugging difficult and disaster recovery unreliable. IaC eliminates this drift by making infrastructure configuration the source of truth.

AWS provides multiple IaC tools: Terraform (multi-cloud, popular, mature ecosystem), CloudFormation (AWS-native, deep integration), and AWS CDK (code-first approach using TypeScript/Python/Java). This guide covers when to use each, how to structure projects, manage state, integrate with CI/CD, and avoid common pitfalls.

IaC Tool Comparison

Terraform vs CloudFormation vs AWS CDK

Each tool has strengths for different use cases:

Decision matrix:

FactorTerraformCloudFormationAWS CDK
Multi-cloudExcellentAWS onlyAWS only
State managementExternal (S3 + DynamoDB)AWS-managedAWS-managed (via CFN)
Learning curveModerate (HCL syntax)Moderate (YAML/JSON)Steep (programming + AWS constructs)
EcosystemLarge (Terraform Registry)AWS resources onlyAWS resources only
CommunityVery largeLarge (AWS docs)Growing
ModularityModules (HCL)Nested stacksL3 constructs (code)
TestingTerratest, terraform validatecfn-lint, cfn-nagUnit tests (Jest, JUnit)
GitLab CI/CDNative integrationAWS CLI + scriptscdk deploy in pipeline
CostFree (open-source)FreeFree
Drift detectionterraform planCloudFormation driftCloudFormation drift (via CFN)

When to Use Terraform

Choose Terraform when:

  • Multi-cloud strategy: Deploying to AWS + Azure/GCP, or planning future cloud migration
  • Existing Terraform expertise: Team already knows Terraform from other projects
  • Third-party integrations: Need to manage non-AWS resources (Datadog monitors, GitHub repos, PagerDuty schedules)
  • Mature module ecosystem: Leveraging community modules from Terraform Registry

Terraform strengths:

  • Consistent workflow across cloud providers (learn once, use everywhere)
  • Powerful module system for reusable components (see Terraform Best Practices for comprehensive module design patterns)
  • Excellent state management with locking (prevents concurrent modification)
  • Plan/apply workflow provides clear change preview before execution

See Terraform Best Practices for comprehensive Terraform coverage including project structure, state management, module design, workspace strategies, and testing.

When to Use CloudFormation

Choose CloudFormation when:

  • AWS-only deployment: No multi-cloud requirements, fully committed to AWS
  • Deep AWS integration: Need features only available in CloudFormation (e.g., stack sets for multi-account, certain resource properties)
  • AWS-native tooling preference: Want infrastructure management fully within AWS ecosystem
  • No external dependencies: Avoid managing external state files or learning third-party tools

CloudFormation strengths:

  • Zero setup (no state management configuration needed)
  • AWS Support team can help troubleshoot CloudFormation issues
  • Some AWS services support CloudFormation before Terraform (new service launches)
  • Stack policies prevent accidental deletion of critical resources
  • StackSets for multi-account/multi-region deployments

When to Use AWS CDK

Choose AWS CDK when:

  • Developers prefer code over config: Team more comfortable with TypeScript/Python/Java than YAML/HCL
  • Complex logic in infrastructure: Need loops, conditionals, data transformations beyond what declarative tools offer
  • Type safety and IDE support: Want autocomplete, refactoring, compile-time checks
  • Construct libraries: Leverage high-level constructs that encode AWS best practices (e.g., ApplicationLoadBalancedFargateService creates ALB + ECS Fargate with best practices)

AWS CDK strengths:

  • Full programming language expressiveness (functions, classes, loops)
  • L3 constructs abstract common patterns (single construct creates multi-resource architectures)
  • Unit testing infrastructure code with familiar testing frameworks
  • TypeScript type checking catches errors before deployment

CDK limitations:

  • Compiles to CloudFormation (inherits CloudFormation limitations like stack size, update constraints)
  • Smaller community than Terraform
  • More complex debugging (CDK → CloudFormation → AWS API calls)

Terraform on AWS

This section covers AWS-specific Terraform patterns. For general Terraform best practices, see Terraform Best Practices.

AWS Provider Configuration

The AWS provider enables Terraform to interact with AWS APIs. Configuration includes authentication, region, and default tags.

Provider configuration with remote state:

# backend.tf - Remote state configuration
terraform {
required_version = ">= 1.5"

required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # Pin to major version
}
}

# Store state in S3 with DynamoDB locking
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true # Encrypt state file
dynamodb_table = "terraform-state-lock" # Prevent concurrent modifications
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abcd-1234" # KMS encryption
}
}

# Provider configuration
provider "aws" {
region = var.aws_region

# Default tags applied to all resources
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = var.environment
Project = var.project_name
CostCenter = var.cost_center
}
}
}

# Optional: Additional provider for different region (e.g., replication)
provider "aws" {
alias = "replica"
region = "us-west-2"

default_tags {
tags = {
ManagedBy = "Terraform"
Environment = var.environment
Project = var.project_name
CostCenter = var.cost_center
}
}
}

Why remote state in S3 is critical: Terraform state contains sensitive data (database passwords, IP addresses, resource IDs). Storing state locally on developer machines creates several problems:

  1. Concurrent modification: Two developers run terraform apply simultaneously → state corruption
  2. Loss of state: Developer's laptop dies → cannot manage infrastructure anymore
  3. Security exposure: State file contains secrets → encrypted S3 bucket with access controls is safer
  4. Team collaboration: State on one machine → other team members cannot see current infrastructure

S3 + DynamoDB locking prevents concurrent modifications:

The lock ensures Dev2 sees Dev1's changes before making their own, preventing lost updates.

State bucket setup:

# bootstrap.tf - Create state bucket and lock table (run once manually)
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

versioning_configuration {
status = "Enabled" # Keep state history for rollback
}
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}

resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}

resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST" # No capacity planning needed
hash_key = "LockID"

attribute {
name = "LockID"
type = "S"
}
}

Important: Run this bootstrap configuration once manually (terraform apply) before configuring remote backend. This is a chicken-and-egg problem: you need S3/DynamoDB to store state, but Terraform needs to create them. After creation, migrate local state to S3 with terraform init -migrate-state.

For comprehensive state management strategies, workspace usage, and multi-environment patterns, see Terraform State Management.

AWS Resource Examples

VPC with Multi-AZ Subnets

# modules/networking/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true

tags = {
Name = "${var.environment}-vpc"
}
}

# Public subnets (one per AZ)
resource "aws_subnet" "public" {
count = length(var.availability_zones)

vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]

map_public_ip_on_launch = true

tags = {
Name = "${var.environment}-public-${var.availability_zones[count.index]}"
Type = "public"
}
}

# Private subnets (one per AZ)
resource "aws_subnet" "private" {
count = length(var.availability_zones)

vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]

tags = {
Name = "${var.environment}-private-${var.availability_zones[count.index]}"
Type = "private"
}
}

# Internet Gateway for public subnets
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id

tags = {
Name = "${var.environment}-igw"
}
}

# NAT Gateway (one per AZ for high availability)
resource "aws_eip" "nat" {
count = length(var.availability_zones)
domain = "vpc"

tags = {
Name = "${var.environment}-nat-${var.availability_zones[count.index]}"
}
}

resource "aws_nat_gateway" "main" {
count = length(var.availability_zones)

allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id

tags = {
Name = "${var.environment}-nat-${var.availability_zones[count.index]}"
}
}

Usage in root module:

# environments/prod/main.tf
module "networking" {
source = "../../modules/networking"

environment = "prod"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

This creates a production-ready VPC with public and private subnets across three availability zones. See AWS Networking for VPC design patterns.

EKS Cluster with IRSA

# modules/eks/main.tf
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
version = var.kubernetes_version

vpc_config {
subnet_ids = var.subnet_ids
endpoint_private_access = true
endpoint_public_access = var.public_access_enabled
}

# Enable IRSA (IAM Roles for Service Accounts)
depends_on = [
aws_iam_role_policy_attachment.cluster_policy,
aws_iam_role_policy_attachment.vpc_resource_controller,
]
}

# OIDC provider for IRSA
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

# Example: IAM role for pods (via IRSA)
resource "aws_iam_role" "pod_s3_access" {
name = "${var.cluster_name}-pod-s3-access"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.eks.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:default:my-app"
}
}
}]
})
}

resource "aws_iam_role_policy_attachment" "pod_s3_access" {
role = aws_iam_role.pod_s3_access.name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
}

IRSA allows Kubernetes pods to assume IAM roles without sharing static credentials. The pod's service account is annotated with the IAM role ARN, and AWS automatically provides temporary credentials. See AWS EKS for IRSA details and AWS IAM for IAM best practices.

RDS with Automated Backups

# modules/database/main.tf
resource "aws_db_instance" "main" {
identifier = var.db_identifier

engine = "postgres"
engine_version = "15.4"
instance_class = var.instance_class

allocated_storage = var.allocated_storage
max_allocated_storage = var.max_allocated_storage # Enable storage autoscaling
storage_encrypted = true
kms_key_id = var.kms_key_arn

db_name = var.database_name
username = var.master_username
password = var.master_password # Should come from AWS Secrets Manager

# Multi-AZ for high availability
multi_az = var.multi_az_enabled

# Subnet group (place in private subnets)
db_subnet_group_name = aws_db_subnet_group.main.name

# Security group (allow access only from app subnets)
vpc_security_group_ids = [aws_security_group.rds.id]

# Automated backups
backup_retention_period = var.backup_retention_days
backup_window = "03:00-04:00" # 3-4 AM UTC
maintenance_window = "Mon:04:00-Mon:05:00"

# Enable automated minor version upgrades
auto_minor_version_upgrade = true

# Performance Insights
performance_insights_enabled = true
performance_insights_kms_key_id = var.kms_key_arn

# Enhanced monitoring
monitoring_interval = 60 # seconds
monitoring_role_arn = aws_iam_role.rds_monitoring.arn

# Deletion protection
deletion_protection = var.environment == "prod" ? true : false
skip_final_snapshot = var.environment == "prod" ? false : true

tags = {
Name = var.db_identifier
}
}

Never hardcode database passwords: Use AWS Secrets Manager to generate and rotate credentials:

resource "aws_secretsmanager_secret" "db_password" {
name = "${var.db_identifier}-password"
recovery_window_in_days = 7 # Prevent immediate deletion
}

resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = random_password.db_password.result
}

resource "random_password" "db_password" {
length = 32
special = true
}

resource "aws_db_instance" "main" {
# ... other configuration ...
password = aws_secretsmanager_secret_version.db_password.secret_string
}

Application retrieves password from Secrets Manager at runtime (see Secrets Management for patterns). See AWS Databases for RDS configuration details.

Terraform Modules for AWS

Modules encapsulate reusable infrastructure patterns. For comprehensive module design, testing, and versioning strategies, see Terraform Modules.

Module composition example:

# environments/prod/main.tf
module "networking" {
source = "../../modules/networking"

environment = "prod"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

module "eks" {
source = "../../modules/eks"

cluster_name = "prod-cluster"
kubernetes_version = "1.28"
subnet_ids = module.networking.private_subnet_ids
public_access_enabled = false # Private cluster

depends_on = [module.networking]
}

module "rds" {
source = "../../modules/database"

db_identifier = "prod-db"
instance_class = "db.r6g.xlarge"
allocated_storage = 100
multi_az_enabled = true
backup_retention_days = 30
subnet_ids = module.networking.private_subnet_ids
allowed_security_groups = [module.eks.node_security_group_id]

depends_on = [module.networking]
}

Outputs from one module become inputs to another, creating dependencies. Terraform automatically determines execution order based on these dependencies.

CloudFormation

CloudFormation uses JSON or YAML templates to define AWS resources. Templates are submitted to CloudFormation, which creates a "stack" - a collection of resources managed as a unit.

CloudFormation Template Structure

# vpc-stack.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Multi-AZ VPC with public and private subnets

Parameters:
Environment:
Type: String
AllowedValues: [dev, staging, prod]
Description: Environment name
VpcCidr:
Type: String
Default: 10.0.0.0/16
Description: VPC CIDR block

Mappings:
# Different configurations per environment
EnvironmentConfig:
dev:
InstanceType: t3.small
MultiAZ: false
staging:
InstanceType: t3.medium
MultiAZ: false
prod:
InstanceType: r6g.large
MultiAZ: true

Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcCidr
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: !Sub '${Environment}-vpc'
- Key: Environment
Value: !Ref Environment

PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Select [0, !Cidr [!Ref VpcCidr, 6, 8]]
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${Environment}-public-1'

PublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Select [1, !Cidr [!Ref VpcCidr, 6, 8]]
AvailabilityZone: !Select [1, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${Environment}-public-2'

PrivateSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Select [2, !Cidr [!Ref VpcCidr, 6, 8]]
AvailabilityZone: !Select [0, !GetAZs '']
Tags:
- Key: Name
Value: !Sub '${Environment}-private-1'

PrivateSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Select [3, !Cidr [!Ref VpcCidr, 6, 8]]
AvailabilityZone: !Select [1, !GetAZs '']
Tags:
- Key: Name
Value: !Sub '${Environment}-private-2'

InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Sub '${Environment}-igw'

AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway

Outputs:
VpcId:
Description: VPC ID
Value: !Ref VPC
Export:
Name: !Sub '${Environment}-VpcId' # Export for cross-stack reference

PublicSubnetIds:
Description: Public subnet IDs
Value: !Join [',', [!Ref PublicSubnet1, !Ref PublicSubnet2]]
Export:
Name: !Sub '${Environment}-PublicSubnets'

PrivateSubnetIds:
Description: Private subnet IDs
Value: !Join [',', [!Ref PrivateSubnet1, !Ref PrivateSubnet2]]
Export:
Name: !Sub '${Environment}-PrivateSubnets'

CloudFormation intrinsic functions:

  • !Ref: Reference another resource or parameter
  • !GetAtt: Get attribute from resource (e.g., !GetAtt VPC.CidrBlock)
  • !Sub: Substitute variables in string (e.g., !Sub '${Environment}-vpc')
  • !Join: Join list with delimiter (e.g., !Join [',', [subnet1, subnet2]])
  • !Select: Select item from list
  • !GetAZs: Get availability zones for region

Nested Stacks

For complex infrastructure, split into multiple stacks:

# master-stack.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Master stack that composes VPC, EKS, and RDS stacks

Parameters:
Environment:
Type: String
AllowedValues: [dev, staging, prod]

Resources:
NetworkingStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://s3.amazonaws.com/my-templates/vpc-stack.yaml
Parameters:
Environment: !Ref Environment
VpcCidr: 10.0.0.0/16

EKSStack:
Type: AWS::CloudFormation::Stack
DependsOn: NetworkingStack
Properties:
TemplateURL: https://s3.amazonaws.com/my-templates/eks-stack.yaml
Parameters:
Environment: !Ref Environment
VpcId: !GetAtt NetworkingStack.Outputs.VpcId
SubnetIds: !GetAtt NetworkingStack.Outputs.PrivateSubnetIds

RDSStack:
Type: AWS::CloudFormation::Stack
DependsOn: NetworkingStack
Properties:
TemplateURL: https://s3.amazonaws.com/my-templates/rds-stack.yaml
Parameters:
Environment: !Ref Environment
VpcId: !GetAtt NetworkingStack.Outputs.VpcId
SubnetIds: !GetAtt NetworkingStack.Outputs.PrivateSubnetIds

Nested stacks enable modular infrastructure: update the RDS stack without touching VPC or EKS. Each stack has independent lifecycle but can reference outputs from other stacks.

StackSets for Multi-Account Deployment

StackSets deploy the same stack across multiple AWS accounts and regions:

# Deploy security baseline to all accounts in organization
aws cloudformation create-stack-set \
--stack-set-name security-baseline \
--template-body file://security-baseline.yaml \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--capabilities CAPABILITY_NAMED_IAM

# Deploy to all accounts in organization
aws cloudformation create-stack-instances \
--stack-set-name security-baseline \
--deployment-targets OrganizationalUnitIds=ou-xxxx-yyyyyyyy \
--regions us-east-1 us-west-2

StackSets are powerful for organization-wide policies: GuardDuty enablement, CloudTrail logging, Config rules, IAM roles. Changes to the StackSet automatically propagate to all accounts.

Drift Detection

Detect manual changes made outside CloudFormation:

# Detect drift
aws cloudformation detect-stack-drift --stack-name prod-vpc

# Get drift results
aws cloudformation describe-stack-resource-drifts \
--stack-name prod-vpc \
--stack-resource-drift-status-filters MODIFIED DELETED

CloudFormation compares actual resource configuration with template definition. Drift indicates someone made manual changes (e.g., modified security group rules via console). Best practice: remediate drift by updating template and reapplying, not by reverting manual changes.

AWS CDK

AWS CDK lets you define infrastructure using TypeScript, Python, Java, or C#. CDK code synthesizes to CloudFormation templates, then deploys via CloudFormation.

CDK Project Structure

cdk-project/
├── bin/
│ └── app.ts # CDK app entry point
├── lib/
│ ├── networking-stack.ts # VPC stack
│ ├── eks-stack.ts # EKS stack
│ └── rds-stack.ts # Database stack
├── test/
│ └── stacks.test.ts # Unit tests
├── cdk.json # CDK configuration
├── package.json # Dependencies
└── tsconfig.json # TypeScript config

ECS Fargate Service with CDK

// lib/ecs-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ecs_patterns from 'aws-cdk-lib/aws-ecs-patterns';
import { Construct } from 'constructs';

export class EcsStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);

// Reference existing VPC (created in separate stack)
const vpc = ec2.Vpc.fromLookup(this, 'VPC', {
vpcId: cdk.Fn.importValue('prod-VpcId'),
});

// ECS cluster
const cluster = new ecs.Cluster(this, 'Cluster', {
vpc,
clusterName: 'prod-cluster',
containerInsights: true, // Enable CloudWatch Container Insights
});

// L3 construct: ApplicationLoadBalancedFargateService
// This single construct creates: ALB, target group, ECS service, task definition,
// security groups, IAM roles - all following AWS best practices
const service = new ecs_patterns.ApplicationLoadBalancedFargateService(
this,
'Service',
{
cluster,
serviceName: 'api-service',
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry('mycompany/api:latest'),
containerPort: 8080,
environment: {
SPRING_PROFILES_ACTIVE: 'prod',
},
secrets: {
// Retrieve secrets from Secrets Manager
DB_PASSWORD: ecs.Secret.fromSecretsManager(
secretsmanager.Secret.fromSecretNameV2(this, 'DBPassword', 'prod-db-password')
),
},
},
cpu: 1024, // 1 vCPU
memoryLimitMiB: 2048, // 2 GB RAM
desiredCount: 3, // 3 tasks across AZs
publicLoadBalancer: true,
healthCheckGracePeriod: cdk.Duration.seconds(60),
}
);

// Auto-scaling based on CPU
const scaling = service.service.autoScaleTaskCount({
minCapacity: 3,
maxCapacity: 10,
});

scaling.scaleOnCpuUtilization('CpuScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.seconds(60),
scaleOutCooldown: cdk.Duration.seconds(60),
});

// Output ALB DNS
new cdk.CfnOutput(this, 'LoadBalancerDNS', {
value: service.loadBalancer.loadBalancerDnsName,
description: 'Application Load Balancer DNS',
});
}
}

L3 constructs like ApplicationLoadBalancedFargateService are CDK's key advantage: a single high-level construct creates a complete, production-ready architecture with security groups, IAM roles, auto-scaling, and health checks configured according to AWS best practices. This drastically reduces boilerplate compared to Terraform or CloudFormation.

CDK Unit Testing

// test/ecs-stack.test.ts
import { Template } from 'aws-cdk-lib/assertions';
import * as cdk from 'aws-cdk-lib';
import { EcsStack } from '../lib/ecs-stack';

describe('EcsStack', () => {
test('Creates ECS Cluster with Container Insights', () => {
const app = new cdk.App();
const stack = new EcsStack(app, 'TestStack');

const template = Template.fromStack(stack);

// Assert ECS cluster has Container Insights enabled
template.hasResourceProperties('AWS::ECS::Cluster', {
ClusterSettings: [
{
Name: 'containerInsights',
Value: 'enabled',
},
],
});
});

test('Creates Fargate service with correct CPU and memory', () => {
const app = new cdk.App();
const stack = new EcsStack(app, 'TestStack');

const template = Template.fromStack(stack);

template.hasResourceProperties('AWS::ECS::TaskDefinition', {
Cpu: '1024',
Memory: '2048',
RequiresCompatibilities: ['FARGATE'],
});
});

test('Configures auto-scaling with min 3 and max 10 tasks', () => {
const app = new cdk.App();
const stack = new EcsStack(app, 'TestStack');

const template = Template.fromStack(stack);

template.hasResourceProperties('AWS::ApplicationAutoScaling::ScalableTarget', {
MinCapacity: 3,
MaxCapacity: 10,
});
});
});

Unit tests validate infrastructure configuration before deployment. This catches errors early (e.g., forgetting to enable encryption, incorrect instance types) that would otherwise only surface during deployment.

Run tests:

npm test

See AWS Compute for ECS configuration details.

GitLab CI/CD Integration

Infrastructure deployments follow the same CI/CD patterns as application code: automated testing, peer review, approval gates, and rollback capabilities.

Terraform Pipeline

# .gitlab-ci.yml for Terraform
stages:
- validate
- plan
- apply

variables:
TF_VERSION: "1.5.7"
TF_ROOT: ${CI_PROJECT_DIR}/environments/prod

# Validate Terraform configuration
terraform:validate:
stage: validate
image:
name: hashicorp/terraform:${TF_VERSION}
entrypoint: [""]
before_script:
- cd ${TF_ROOT}
- terraform init -backend=false # No state for validation
script:
- terraform fmt -check # Check formatting
- terraform validate # Validate syntax
- tflint # Lint for errors/warnings
only:
- merge_requests
- main

# Generate and save plan
terraform:plan:
stage: plan
image:
name: hashicorp/terraform:${TF_VERSION}
entrypoint: [""]
before_script:
- cd ${TF_ROOT}
- terraform init
script:
- terraform plan -out=tfplan
- terraform show -json tfplan > plan.json
artifacts:
paths:
- ${TF_ROOT}/tfplan
- ${TF_ROOT}/plan.json
expire_in: 1 week
only:
- main

# Apply changes (manual gate)
terraform:apply:
stage: apply
image:
name: hashicorp/terraform:${TF_VERSION}
entrypoint: [""]
before_script:
- cd ${TF_ROOT}
- terraform init
script:
- terraform apply -auto-approve tfplan
dependencies:
- terraform:plan
when: manual # Require manual approval
only:
- main
environment:
name: production
action: deploy

Pipeline flow:

Key features:

  1. Validation on MRs: Every merge request runs terraform validate and tflint, catching errors before merge
  2. Plan on main branch: After merge, pipeline generates plan showing exactly what will change
  3. Manual approval: Engineer reviews plan, clicks "Apply" button in GitLab to proceed
  4. Artifact storage: Plan is saved as artifact, ensuring apply executes the reviewed plan (not a new plan)

AWS authentication in GitLab CI:

# Use OIDC federation (no long-lived credentials)
terraform:plan:
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://gitlab.com
before_script:
- |
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
$(aws sts assume-role-with-web-identity \
--role-arn ${AWS_ROLE_ARN} \
--role-session-name "gitlab-${CI_PROJECT_ID}-${CI_PIPELINE_ID}" \
--web-identity-token ${GITLAB_OIDC_TOKEN} \
--duration-seconds 3600 \
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
--output text))

OIDC federation allows GitLab to assume an AWS IAM role without storing AWS credentials in GitLab. GitLab generates a short-lived token, AWS verifies it, and issues temporary credentials. See AWS IAM for OIDC setup.

For comprehensive CI/CD patterns, caching strategies, and deployment workflows, see GitLab CI/CD Pipelines.

CloudFormation Pipeline

# .gitlab-ci.yml for CloudFormation
stages:
- validate
- deploy

variables:
STACK_NAME: prod-vpc
TEMPLATE_FILE: templates/vpc-stack.yaml
PARAMETERS_FILE: parameters/prod-params.json

cloudformation:validate:
stage: validate
image: python:3.11
before_script:
- pip install cfn-lint
script:
- cfn-lint ${TEMPLATE_FILE}
only:
- merge_requests
- main

cloudformation:deploy:
stage: deploy
image:
name: amazon/aws-cli:latest
entrypoint: [""]
script:
# Create change set
- |
aws cloudformation create-change-set \
--stack-name ${STACK_NAME} \
--change-set-name changeset-${CI_PIPELINE_ID} \
--template-body file://${TEMPLATE_FILE} \
--parameters file://${PARAMETERS_FILE} \
--capabilities CAPABILITY_NAMED_IAM

# Wait for change set creation
- |
aws cloudformation wait change-set-create-complete \
--stack-name ${STACK_NAME} \
--change-set-name changeset-${CI_PIPELINE_ID}

# Describe changes
- |
aws cloudformation describe-change-set \
--stack-name ${STACK_NAME} \
--change-set-name changeset-${CI_PIPELINE_ID}

# Execute change set
- |
aws cloudformation execute-change-set \
--stack-name ${STACK_NAME} \
--change-set-name changeset-${CI_PIPELINE_ID}

# Wait for completion
- |
aws cloudformation wait stack-update-complete \
--stack-name ${STACK_NAME}
when: manual
only:
- main
environment:
name: production

CloudFormation change sets show exactly what will change before applying, similar to terraform plan.

CDK Pipeline

# .gitlab-ci.yml for CDK
stages:
- test
- synth
- deploy

variables:
CDK_VERSION: "2.100.0"

cdk:test:
stage: test
image: node:18
before_script:
- npm ci
script:
- npm test
coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
only:
- merge_requests
- main

cdk:synth:
stage: synth
image: node:18
before_script:
- npm ci
- npm install -g aws-cdk@${CDK_VERSION}
script:
- cdk synth
artifacts:
paths:
- cdk.out/
expire_in: 1 week
only:
- main

cdk:deploy:
stage: deploy
image: node:18
before_script:
- npm ci
- npm install -g aws-cdk@${CDK_VERSION}
script:
- cdk deploy --require-approval never --all
dependencies:
- cdk:synth
when: manual
only:
- main
environment:
name: production

CDK deployment includes unit testing (validate infrastructure logic) before synthesis and deployment.

State Management Best Practices

Infrastructure state contains the mapping between your configuration and actual cloud resources. Protecting and managing state is critical.

Encryption at Rest

Terraform:

# backend.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true # Server-side encryption
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abcd-1234"
}
}

CloudFormation: State managed by AWS, encrypted by default.

CDK: Uses CloudFormation, encrypted by default.

Access Control

State files contain sensitive data (resource IDs, IP addresses, sometimes passwords). Restrict access:

// S3 bucket policy for Terraform state
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowTerraformRole",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/TerraformExecutionRole"
},
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::mycompany-terraform-state/*"
},
{
"Sid": "DenyUnencryptedUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::mycompany-terraform-state/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
}
]
}

Only the Terraform execution role (used by CI/CD pipelines) can read/write state. Individual developers should not have direct S3 access to state.

State Locking

Terraform: DynamoDB table prevents concurrent modifications (configured in backend).

CloudFormation: AWS handles locking automatically (stack updates are serialized).

CDK: Uses CloudFormation, automatic locking.

Versioning and Backup

Enable S3 versioning on Terraform state bucket:

resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

versioning_configuration {
status = "Enabled"
}
}

If terraform apply corrupts state, revert to previous version:

# List versions
aws s3api list-object-versions --bucket mycompany-terraform-state --prefix prod/terraform.tfstate

# Restore previous version
aws s3api copy-object \
--bucket mycompany-terraform-state \
--copy-source mycompany-terraform-state/prod/terraform.tfstate?versionId=VERSION_ID \
--key prod/terraform.tfstate

For comprehensive state management strategies including workspace usage and multi-environment patterns, see Terraform State Management.

Testing Infrastructure Code

Terraform Testing

1. Static validation:

terraform fmt -check  # Check formatting
terraform validate # Syntax validation
tflint # AWS-specific linting

2. Policy validation with OPA:

# policy/encryption.rego
package terraform.encryption

# Deny S3 buckets without encryption
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
not resource.server_side_encryption_configuration
msg := sprintf("S3 bucket '%s' must have encryption enabled", [name])
}

# Deny RDS instances without encryption
deny[msg] {
resource := input.resource.aws_db_instance[name]
resource.storage_encrypted != true
msg := sprintf("RDS instance '%s' must have storage encryption enabled", [name])
}

Run policy checks:

# Convert plan to JSON
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Validate with OPA
opa exec --decision terraform/encryption/deny --bundle policy/ plan.json

OPA policies enforce organizational standards (encryption required, public access blocked, tagging requirements) automatically in CI/CD.

3. Integration testing with Terratest:

See Terraform Testing for Terratest examples.

CloudFormation Testing

Static validation:

# Validate template syntax
aws cloudformation validate-template --template-body file://template.yaml

# Lint with cfn-lint
pip install cfn-lint
cfn-lint template.yaml

# Security scanning with cfn-nag
gem install cfn-nag
cfn_nag_scan --input-path template.yaml

cfn-nag detects security issues:

Failures count: 2
Warnings count: 1

Failures:
- Resources: ["PublicSubnet"]
Message: Subnet should not map public IPs on launch

- Resources: ["SecurityGroup"]
Message: Security group should not allow ingress from 0.0.0.0/0

CDK Testing

Unit tests (validate infrastructure logic):

See CDK unit testing example above.

Integration tests (deploy to test account):

// test/integration.test.ts
import { execSync } from 'child_process';

describe('CDK Integration Test', () => {
test('Deploy to test account', () => {
// Deploy to isolated test account
execSync('cdk deploy --all --require-approval never', {
env: {
...process.env,
AWS_PROFILE: 'test-account',
},
});

// Run smoke tests against deployed infrastructure
// (e.g., check ALB health, query RDS, etc.)

// Destroy after testing
execSync('cdk destroy --all --force', {
env: {
...process.env,
AWS_PROFILE: 'test-account',
},
});
}, 600000); // 10 minute timeout
});

For comprehensive testing strategies, see Integration Testing.

Secrets in Infrastructure Code

Never hardcode secrets in Terraform, CloudFormation, or CDK. Use AWS Secrets Manager or SSM Parameter Store.

Terraform with Secrets Manager

# Create secret
resource "aws_secretsmanager_secret" "db_password" {
name = "prod-db-password"
}

resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = random_password.db_password.result
}

# Use secret in RDS
resource "aws_db_instance" "main" {
# ... other config ...
password = aws_secretsmanager_secret_version.db_password.secret_string
}

# Application retrieves secret at runtime (not from Terraform)
output "db_secret_arn" {
value = aws_secretsmanager_secret.db_password.arn
description = "ARN of database password secret (application retrieves value)"
}

Important: The secret value is in Terraform state (encrypted), but applications should retrieve secrets from Secrets Manager at runtime, not from Terraform outputs. This allows secret rotation without Terraform changes.

CloudFormation with Secrets Manager

Resources:
DBPassword:
Type: AWS::SecretsManager::Secret
Properties:
GenerateSecretString:
PasswordLength: 32
ExcludeCharacters: '"@/\'

DBInstance:
Type: AWS::RDS::DBInstance
Properties:
MasterUsername: admin
MasterUserPassword: !Sub '{{resolve:secretsmanager:${DBPassword}:SecretString}}'
# ... other config ...

CloudFormation's !Sub with {{resolve:secretsmanager:...}} retrieves the secret during stack creation/update without exposing it in template parameters.

CDK with Secrets Manager

import * as secretsmanager from 'aws-cdk-lib/aws-secretsmanager';

// Generate secret
const dbPassword = new secretsmanager.Secret(this, 'DBPassword', {
generateSecretString: {
passwordLength: 32,
excludeCharacters: '"@/\\',
},
});

// Use in RDS
const db = new rds.DatabaseInstance(this, 'Database', {
// ... other config ...
credentials: rds.Credentials.fromSecret(dbPassword),
});

See Secrets Management for comprehensive secret handling patterns including rotation, access control, and application integration.

Multi-Environment Management

Terraform: Separate State Files per Environment

terraform/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── backend.tf # State: s3://tf-state/dev/terraform.tfstate
│ │ └── terraform.tfvars # dev-specific values
│ ├── staging/
│ │ ├── main.tf
│ │ ├── backend.tf # State: s3://tf-state/staging/terraform.tfstate
│ │ └── terraform.tfvars # staging-specific values
│ └── prod/
│ ├── main.tf
│ ├── backend.tf # State: s3://tf-state/prod/terraform.tfstate
│ └── terraform.tfvars # prod-specific values

Each environment has isolated state, preventing accidental cross-environment changes. Variable files (.tfvars) contain environment-specific values:

# environments/dev/terraform.tfvars
environment = "dev"
instance_type = "t3.small"
database_size = "db.t3.small"
multi_az_enabled = false
backup_retention = 7

# environments/prod/terraform.tfvars
environment = "prod"
instance_type = "r6g.xlarge"
database_size = "db.r6g.2xlarge"
multi_az_enabled = true
backup_retention = 30

For workspace strategies and environment isolation patterns, see Terraform Multi-Environment.

CloudFormation: Parameter Files per Environment

# Deploy to dev
aws cloudformation deploy \
--stack-name dev-vpc \
--template-file vpc-stack.yaml \
--parameter-overrides file://parameters/dev-params.json

# Deploy to prod
aws cloudformation deploy \
--stack-name prod-vpc \
--template-file vpc-stack.yaml \
--parameter-overrides file://parameters/prod-params.json

CDK: Context Values per Environment

// bin/app.ts
const app = new cdk.App();

const env = app.node.tryGetContext('environment') || 'dev';
const config = app.node.tryGetContext(env);

new EcsStack(app, `${env}-ecs-stack`, {
env: {
account: config.account,
region: config.region,
},
instanceType: config.instanceType,
desiredCount: config.desiredCount,
});

cdk.json:

{
"context": {
"dev": {
"account": "111111111111",
"region": "us-east-1",
"instanceType": "t3.small",
"desiredCount": 1
},
"prod": {
"account": "222222222222",
"region": "us-east-1",
"instanceType": "r6g.large",
"desiredCount": 3
}
}
}

Deploy:

cdk deploy --context environment=dev
cdk deploy --context environment=prod

Common Anti-Patterns

Hardcoded Values

Problem: Hardcoding account IDs, regions, IP addresses makes infrastructure non-portable.

# BAD
resource "aws_s3_bucket" "logs" {
bucket = "mycompany-logs-us-east-1-123456789012" # Hardcoded account ID
}

Fix: Use data sources and variables:

# GOOD
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

resource "aws_s3_bucket" "logs" {
bucket = "mycompany-logs-${data.aws_region.current.name}-${data.aws_caller_identity.current.account_id}"
}

No State Locking

Problem: Running Terraform without DynamoDB locking allows concurrent modifications.

Fix: Always configure state locking in backend (see State Management above).

Monolithic Templates

Problem: Single massive Terraform/CloudFormation file containing all infrastructure.

Fix: Modularize:

  • Terraform: Use modules (see Terraform Modules)
  • CloudFormation: Use nested stacks
  • CDK: Use multiple stack classes

Ignoring Drift

Problem: Manual changes made via AWS console, never reconciled with IaC.

Fix:

  • Run terraform plan / CloudFormation drift detection regularly
  • Remediate drift by updating IaC, not by reverting manual changes
  • Enforce "no manual changes" policy via IAM restrictions (only allow CI/CD role to modify resources)

No Testing

Problem: Deploying infrastructure changes without validation.

Fix:

  • Static validation (terraform validate, cfn-lint)
  • Policy validation (OPA)
  • Integration tests (deploy to test account, validate, destroy)
  • See Testing Infrastructure Code

Secrets in Code

Problem: Storing secrets in .tf files or CloudFormation templates.

Fix: Use AWS Secrets Manager (see Secrets in Infrastructure Code).

Summary

Infrastructure as Code on AWS transforms manual, error-prone infrastructure management into versioned, tested, automated workflows. Key principles:

  1. Choose the right tool: Terraform for multi-cloud, CloudFormation for AWS-native, CDK for code-first approach
  2. Remote state: Store Terraform state in S3 with DynamoDB locking; CloudFormation/CDK state is AWS-managed
  3. Modularization: Break infrastructure into reusable modules/stacks/constructs
  4. CI/CD integration: Automate validation, planning, and deployment through GitLab pipelines
  5. Testing: Validate syntax, enforce policies (OPA), run integration tests
  6. Secrets: Never hardcode secrets; use Secrets Manager or Parameter Store
  7. Multi-environment: Isolated state per environment, environment-specific variables
  8. Drift detection: Regularly check for manual changes, remediate via IaC

When to use each tool:

  • Terraform: Multi-cloud, mature ecosystem, team already knows Terraform
  • CloudFormation: AWS-only, zero setup, deep AWS integration
  • CDK: Developers prefer code over config, need type safety and high-level constructs

For comprehensive Terraform coverage including advanced module patterns, workspace strategies, testing with Terratest, and state management, see Terraform Best Practices.

Further Reading