Infrastructure as Code on AWS

Infrastructure as Code (IaC) treats infrastructure configuration as versioned, peer-reviewed code rather than manual console operations. On AWS, this means defining VPCs, EC2 instances, RDS databases, IAM roles, and all other resources declaratively in configuration files. Changes flow through version control, code review, automated testing, and CI/CD pipelines - the same rigorous process used for application code.

The fundamental advantage of IaC is reproducibility: create identical environments on demand, recover from disasters by re-running infrastructure code, and ensure consistency across development, staging, and production. Manual infrastructure changes (ClickOps) create drift - the live environment diverges from documentation, making debugging difficult and disaster recovery unreliable. IaC eliminates this drift by making infrastructure configuration the source of truth.

AWS provides multiple IaC tools: Terraform (multi-cloud, popular, mature ecosystem), CloudFormation (AWS-native, deep integration), and AWS CDK (code-first approach using TypeScript/Python/Java). This guide covers when to use each, how to structure projects, manage state, integrate with CI/CD, and avoid common pitfalls.

IaC Tool Comparison

Terraform vs CloudFormation vs AWS CDK

Each tool has strengths for different use cases:

Decision matrix:

Factor	Terraform	CloudFormation	AWS CDK
Multi-cloud	Excellent	AWS only	AWS only
State management	External (S3 + DynamoDB)	AWS-managed	AWS-managed (via CFN)
Learning curve	Moderate (HCL syntax)	Moderate (YAML/JSON)	Steep (programming + AWS constructs)
Ecosystem	Large (Terraform Registry)	AWS resources only	AWS resources only
Community	Very large	Large (AWS docs)	Growing
Modularity	Modules (HCL)	Nested stacks	L3 constructs (code)
Testing	Terratest, terraform validate	cfn-lint, cfn-nag	Unit tests (Jest, JUnit)
GitLab CI/CD	Native integration	AWS CLI + scripts	cdk deploy in pipeline
Cost	Free (open-source)	Free	Free
Drift detection	terraform plan	CloudFormation drift	CloudFormation drift (via CFN)

When to Use Terraform

Choose Terraform when:

Multi-cloud strategy: Deploying to AWS + Azure/GCP, or planning future cloud migration
Existing Terraform expertise: Team already knows Terraform from other projects
Third-party integrations: Need to manage non-AWS resources (Datadog monitors, GitHub repos, PagerDuty schedules)
Mature module ecosystem: Leveraging community modules from Terraform Registry

Terraform strengths:

Consistent workflow across cloud providers (learn once, use everywhere)
Powerful module system for reusable components (see Terraform Best Practices for comprehensive module design patterns)
Excellent state management with locking (prevents concurrent modification)
Plan/apply workflow provides clear change preview before execution

See Terraform Best Practices for comprehensive Terraform coverage including project structure, state management, module design, workspace strategies, and testing.

When to Use CloudFormation

Choose CloudFormation when:

AWS-only deployment: No multi-cloud requirements, fully committed to AWS
Deep AWS integration: Need features only available in CloudFormation (e.g., stack sets for multi-account, certain resource properties)
AWS-native tooling preference: Want infrastructure management fully within AWS ecosystem
No external dependencies: Avoid managing external state files or learning third-party tools

CloudFormation strengths:

Zero setup (no state management configuration needed)
AWS Support team can help troubleshoot CloudFormation issues
Some AWS services support CloudFormation before Terraform (new service launches)
Stack policies prevent accidental deletion of critical resources
StackSets for multi-account/multi-region deployments

When to Use AWS CDK

Choose AWS CDK when:

Developers prefer code over config: Team more comfortable with TypeScript/Python/Java than YAML/HCL
Complex logic in infrastructure: Need loops, conditionals, data transformations beyond what declarative tools offer
Type safety and IDE support: Want autocomplete, refactoring, compile-time checks
Construct libraries: Leverage high-level constructs that encode AWS best practices (e.g., ApplicationLoadBalancedFargateService creates ALB + ECS Fargate with best practices)

AWS CDK strengths:

Full programming language expressiveness (functions, classes, loops)
L3 constructs abstract common patterns (single construct creates multi-resource architectures)
Unit testing infrastructure code with familiar testing frameworks
TypeScript type checking catches errors before deployment

CDK limitations:

Compiles to CloudFormation (inherits CloudFormation limitations like stack size, update constraints)
Smaller community than Terraform
More complex debugging (CDK → CloudFormation → AWS API calls)

Terraform on AWS

This section covers AWS-specific Terraform patterns. For general Terraform best practices, see Terraform Best Practices.

AWS Provider Configuration

The AWS provider enables Terraform to interact with AWS APIs. Configuration includes authentication, region, and default tags.

Provider configuration with remote state:

# backend.tf - Remote state configuration
terraform {
  required_version = ">= 1.5"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"  # Pin to major version
    }
  }

  # Store state in S3 with DynamoDB locking
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "prod/vpc/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true  # Encrypt state file
    dynamodb_table = "terraform-state-lock"  # Prevent concurrent modifications
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/abcd-1234"  # KMS encryption
  }
}

# Provider configuration
provider "aws" {
  region = var.aws_region

  # Default tags applied to all resources
  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = var.environment
      Project     = var.project_name
      CostCenter  = var.cost_center
    }
  }
}

# Optional: Additional provider for different region (e.g., replication)
provider "aws" {
  alias  = "replica"
  region = "us-west-2"

  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = var.environment
      Project     = var.project_name
      CostCenter  = var.cost_center
    }
  }
}

Why remote state in S3 is critical: Terraform state contains sensitive data (database passwords, IP addresses, resource IDs). Storing state locally on developer machines creates several problems:

Concurrent modification: Two developers run terraform apply simultaneously → state corruption
Loss of state: Developer's laptop dies → cannot manage infrastructure anymore
Security exposure: State file contains secrets → encrypted S3 bucket with access controls is safer
Team collaboration: State on one machine → other team members cannot see current infrastructure

S3 + DynamoDB locking prevents concurrent modifications:

The lock ensures Dev2 sees Dev1's changes before making their own, preventing lost updates.

State bucket setup:

# bootstrap.tf - Create state bucket and lock table (run once manually)
resource "aws_s3_bucket" "terraform_state" {
  bucket = "mycompany-terraform-state"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"  # Keep state history for rollback
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.terraform_state.arn
    }
  }
}

resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "terraform_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"  # No capacity planning needed
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Important: Run this bootstrap configuration once manually (terraform apply) before configuring remote backend. This is a chicken-and-egg problem: you need S3/DynamoDB to store state, but Terraform needs to create them. After creation, migrate local state to S3 with terraform init -migrate-state.

For comprehensive state management strategies, workspace usage, and multi-environment patterns, see Terraform State Management.

AWS Resource Examples

VPC with Multi-AZ Subnets

# modules/networking/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.environment}-vpc"
  }
}

# Public subnets (one per AZ)
resource "aws_subnet" "public" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone = var.availability_zones[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "${var.environment}-public-${var.availability_zones[count.index]}"
    Type = "public"
  }
}

# Private subnets (one per AZ)
resource "aws_subnet" "private" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.environment}-private-${var.availability_zones[count.index]}"
    Type = "private"
  }
}

# Internet Gateway for public subnets
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.environment}-igw"
  }
}

# NAT Gateway (one per AZ for high availability)
resource "aws_eip" "nat" {
  count  = length(var.availability_zones)
  domain = "vpc"

  tags = {
    Name = "${var.environment}-nat-${var.availability_zones[count.index]}"
  }
}

resource "aws_nat_gateway" "main" {
  count = length(var.availability_zones)

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.environment}-nat-${var.availability_zones[count.index]}"
  }
}

Usage in root module:

# environments/prod/main.tf
module "networking" {
  source = "../../modules/networking"

  environment        = "prod"
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

This creates a production-ready VPC with public and private subnets across three availability zones. See AWS Networking for VPC design patterns.

EKS Cluster with IRSA

# modules/eks/main.tf
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.cluster.arn
  version  = var.kubernetes_version

  vpc_config {
    subnet_ids              = var.subnet_ids
    endpoint_private_access = true
    endpoint_public_access  = var.public_access_enabled
  }

  # Enable IRSA (IAM Roles for Service Accounts)
  depends_on = [
    aws_iam_role_policy_attachment.cluster_policy,
    aws_iam_role_policy_attachment.vpc_resource_controller,
  ]
}

# OIDC provider for IRSA
resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

# Example: IAM role for pods (via IRSA)
resource "aws_iam_role" "pod_s3_access" {
  name = "${var.cluster_name}-pod-s3-access"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.eks.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "${replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:default:my-app"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "pod_s3_access" {
  role       = aws_iam_role.pod_s3_access.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
}

IRSA allows Kubernetes pods to assume IAM roles without sharing static credentials. The pod's service account is annotated with the IAM role ARN, and AWS automatically provides temporary credentials. See AWS EKS for IRSA details and AWS IAM for IAM best practices.

RDS with Automated Backups

# modules/database/main.tf
resource "aws_db_instance" "main" {
  identifier = var.db_identifier

  engine         = "postgres"
  engine_version = "15.4"
  instance_class = var.instance_class

  allocated_storage     = var.allocated_storage
  max_allocated_storage = var.max_allocated_storage  # Enable storage autoscaling
  storage_encrypted     = true
  kms_key_id            = var.kms_key_arn

  db_name  = var.database_name
  username = var.master_username
  password = var.master_password  # Should come from AWS Secrets Manager

  # Multi-AZ for high availability
  multi_az = var.multi_az_enabled

  # Subnet group (place in private subnets)
  db_subnet_group_name = aws_db_subnet_group.main.name

  # Security group (allow access only from app subnets)
  vpc_security_group_ids = [aws_security_group.rds.id]

  # Automated backups
  backup_retention_period = var.backup_retention_days
  backup_window           = "03:00-04:00"  # 3-4 AM UTC
  maintenance_window      = "Mon:04:00-Mon:05:00"

  # Enable automated minor version upgrades
  auto_minor_version_upgrade = true

  # Performance Insights
  performance_insights_enabled    = true
  performance_insights_kms_key_id = var.kms_key_arn

  # Enhanced monitoring
  monitoring_interval = 60  # seconds
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn

  # Deletion protection
  deletion_protection = var.environment == "prod" ? true : false
  skip_final_snapshot = var.environment == "prod" ? false : true

  tags = {
    Name = var.db_identifier
  }
}

Never hardcode database passwords: Use AWS Secrets Manager to generate and rotate credentials:

resource "aws_secretsmanager_secret" "db_password" {
  name                    = "${var.db_identifier}-password"
  recovery_window_in_days = 7  # Prevent immediate deletion
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id     = aws_secretsmanager_secret.db_password.id
  secret_string = random_password.db_password.result
}

resource "random_password" "db_password" {
  length  = 32
  special = true
}

resource "aws_db_instance" "main" {
  # ... other configuration ...
  password = aws_secretsmanager_secret_version.db_password.secret_string
}

Application retrieves password from Secrets Manager at runtime (see Secrets Management for patterns). See AWS Databases for RDS configuration details.

Terraform Modules for AWS

Modules encapsulate reusable infrastructure patterns. For comprehensive module design, testing, and versioning strategies, see Terraform Modules.

Module composition example:

# environments/prod/main.tf
module "networking" {
  source = "../../modules/networking"

  environment        = "prod"
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

module "eks" {
  source = "../../modules/eks"

  cluster_name            = "prod-cluster"
  kubernetes_version      = "1.28"
  subnet_ids              = module.networking.private_subnet_ids
  public_access_enabled   = false  # Private cluster

  depends_on = [module.networking]
}

module "rds" {
  source = "../../modules/database"

  db_identifier           = "prod-db"
  instance_class          = "db.r6g.xlarge"
  allocated_storage       = 100
  multi_az_enabled        = true
  backup_retention_days   = 30
  subnet_ids              = module.networking.private_subnet_ids
  allowed_security_groups = [module.eks.node_security_group_id]

  depends_on = [module.networking]
}

Outputs from one module become inputs to another, creating dependencies. Terraform automatically determines execution order based on these dependencies.

CloudFormation

CloudFormation uses JSON or YAML templates to define AWS resources. Templates are submitted to CloudFormation, which creates a "stack" - a collection of resources managed as a unit.

CloudFormation Template Structure

# vpc-stack.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Multi-AZ VPC with public and private subnets

Parameters:
  Environment:
    Type: String
    AllowedValues: [dev, staging, prod]
    Description: Environment name
  VpcCidr:
    Type: String
    Default: 10.0.0.0/16
    Description: VPC CIDR block

Mappings:
  # Different configurations per environment
  EnvironmentConfig:
    dev:
      InstanceType: t3.small
      MultiAZ: false
    staging:
      InstanceType: t3.medium
      MultiAZ: false
    prod:
      InstanceType: r6g.large
      MultiAZ: true

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: !Ref VpcCidr
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: !Sub '${Environment}-vpc'
        - Key: Environment
          Value: !Ref Environment

  PublicSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Select [0, !Cidr [!Ref VpcCidr, 6, 8]]
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: !Sub '${Environment}-public-1'

  PublicSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Select [1, !Cidr [!Ref VpcCidr, 6, 8]]
      AvailabilityZone: !Select [1, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: !Sub '${Environment}-public-2'

  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Select [2, !Cidr [!Ref VpcCidr, 6, 8]]
      AvailabilityZone: !Select [0, !GetAZs '']
      Tags:
        - Key: Name
          Value: !Sub '${Environment}-private-1'

  PrivateSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: !Select [3, !Cidr [!Ref VpcCidr, 6, 8]]
      AvailabilityZone: !Select [1, !GetAZs '']
      Tags:
        - Key: Name
          Value: !Sub '${Environment}-private-2'

  InternetGateway:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: !Sub '${Environment}-igw'

  AttachGateway:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref VPC
      InternetGatewayId: !Ref InternetGateway

Outputs:
  VpcId:
    Description: VPC ID
    Value: !Ref VPC
    Export:
      Name: !Sub '${Environment}-VpcId'  # Export for cross-stack reference

  PublicSubnetIds:
    Description: Public subnet IDs
    Value: !Join [',', [!Ref PublicSubnet1, !Ref PublicSubnet2]]
    Export:
      Name: !Sub '${Environment}-PublicSubnets'

  PrivateSubnetIds:
    Description: Private subnet IDs
    Value: !Join [',', [!Ref PrivateSubnet1, !Ref PrivateSubnet2]]
    Export:
      Name: !Sub '${Environment}-PrivateSubnets'

CloudFormation intrinsic functions:

!Ref: Reference another resource or parameter
!GetAtt: Get attribute from resource (e.g., !GetAtt VPC.CidrBlock)
!Sub: Substitute variables in string (e.g., !Sub '${Environment}-vpc')
!Join: Join list with delimiter (e.g., !Join [',', [subnet1, subnet2]])
!Select: Select item from list
!GetAZs: Get availability zones for region

Nested Stacks

For complex infrastructure, split into multiple stacks:

# master-stack.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Master stack that composes VPC, EKS, and RDS stacks

Parameters:
  Environment:
    Type: String
    AllowedValues: [dev, staging, prod]

Resources:
  NetworkingStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/my-templates/vpc-stack.yaml
      Parameters:
        Environment: !Ref Environment
        VpcCidr: 10.0.0.0/16

  EKSStack:
    Type: AWS::CloudFormation::Stack
    DependsOn: NetworkingStack
    Properties:
      TemplateURL: https://s3.amazonaws.com/my-templates/eks-stack.yaml
      Parameters:
        Environment: !Ref Environment
        VpcId: !GetAtt NetworkingStack.Outputs.VpcId
        SubnetIds: !GetAtt NetworkingStack.Outputs.PrivateSubnetIds

  RDSStack:
    Type: AWS::CloudFormation::Stack
    DependsOn: NetworkingStack
    Properties:
      TemplateURL: https://s3.amazonaws.com/my-templates/rds-stack.yaml
      Parameters:
        Environment: !Ref Environment
        VpcId: !GetAtt NetworkingStack.Outputs.VpcId
        SubnetIds: !GetAtt NetworkingStack.Outputs.PrivateSubnetIds

Nested stacks enable modular infrastructure: update the RDS stack without touching VPC or EKS. Each stack has independent lifecycle but can reference outputs from other stacks.

StackSets for Multi-Account Deployment

StackSets deploy the same stack across multiple AWS accounts and regions:

# Deploy security baseline to all accounts in organization
aws cloudformation create-stack-set \
  --stack-set-name security-baseline \
  --template-body file://security-baseline.yaml \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
  --capabilities CAPABILITY_NAMED_IAM

# Deploy to all accounts in organization
aws cloudformation create-stack-instances \
  --stack-set-name security-baseline \
  --deployment-targets OrganizationalUnitIds=ou-xxxx-yyyyyyyy \
  --regions us-east-1 us-west-2

StackSets are powerful for organization-wide policies: GuardDuty enablement, CloudTrail logging, Config rules, IAM roles. Changes to the StackSet automatically propagate to all accounts.

Drift Detection

Detect manual changes made outside CloudFormation:

# Detect drift
aws cloudformation detect-stack-drift --stack-name prod-vpc

# Get drift results
aws cloudformation describe-stack-resource-drifts \
  --stack-name prod-vpc \
  --stack-resource-drift-status-filters MODIFIED DELETED

CloudFormation compares actual resource configuration with template definition. Drift indicates someone made manual changes (e.g., modified security group rules via console). Best practice: remediate drift by updating template and reapplying, not by reverting manual changes.

AWS CDK

AWS CDK lets you define infrastructure using TypeScript, Python, Java, or C#. CDK code synthesizes to CloudFormation templates, then deploys via CloudFormation.

CDK Project Structure

cdk-project/
├── bin/
│   └── app.ts              # CDK app entry point
├── lib/
│   ├── networking-stack.ts # VPC stack
│   ├── eks-stack.ts        # EKS stack
│   └── rds-stack.ts        # Database stack
├── test/
│   └── stacks.test.ts      # Unit tests
├── cdk.json                # CDK configuration
├── package.json            # Dependencies
└── tsconfig.json           # TypeScript config

ECS Fargate Service with CDK

// lib/ecs-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ecs_patterns from 'aws-cdk-lib/aws-ecs-patterns';
import { Construct } from 'constructs';

export class EcsStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Reference existing VPC (created in separate stack)
    const vpc = ec2.Vpc.fromLookup(this, 'VPC', {
      vpcId: cdk.Fn.importValue('prod-VpcId'),
    });

    // ECS cluster
    const cluster = new ecs.Cluster(this, 'Cluster', {
      vpc,
      clusterName: 'prod-cluster',
      containerInsights: true,  // Enable CloudWatch Container Insights
    });

    // L3 construct: ApplicationLoadBalancedFargateService
    // This single construct creates: ALB, target group, ECS service, task definition,
    // security groups, IAM roles - all following AWS best practices
    const service = new ecs_patterns.ApplicationLoadBalancedFargateService(
      this,
      'Service',
      {
        cluster,
        serviceName: 'api-service',
        taskImageOptions: {
          image: ecs.ContainerImage.fromRegistry('mycompany/api:latest'),
          containerPort: 8080,
          environment: {
            SPRING_PROFILES_ACTIVE: 'prod',
          },
          secrets: {
            // Retrieve secrets from Secrets Manager
            DB_PASSWORD: ecs.Secret.fromSecretsManager(
              secretsmanager.Secret.fromSecretNameV2(this, 'DBPassword', 'prod-db-password')
            ),
          },
        },
        cpu: 1024,                    // 1 vCPU
        memoryLimitMiB: 2048,         // 2 GB RAM
        desiredCount: 3,              // 3 tasks across AZs
        publicLoadBalancer: true,
        healthCheckGracePeriod: cdk.Duration.seconds(60),
      }
    );

    // Auto-scaling based on CPU
    const scaling = service.service.autoScaleTaskCount({
      minCapacity: 3,
      maxCapacity: 10,
    });

    scaling.scaleOnCpuUtilization('CpuScaling', {
      targetUtilizationPercent: 70,
      scaleInCooldown: cdk.Duration.seconds(60),
      scaleOutCooldown: cdk.Duration.seconds(60),
    });

    // Output ALB DNS
    new cdk.CfnOutput(this, 'LoadBalancerDNS', {
      value: service.loadBalancer.loadBalancerDnsName,
      description: 'Application Load Balancer DNS',
    });
  }
}

L3 constructs like ApplicationLoadBalancedFargateService are CDK's key advantage: a single high-level construct creates a complete, production-ready architecture with security groups, IAM roles, auto-scaling, and health checks configured according to AWS best practices. This drastically reduces boilerplate compared to Terraform or CloudFormation.

CDK Unit Testing

// test/ecs-stack.test.ts
import { Template } from 'aws-cdk-lib/assertions';
import * as cdk from 'aws-cdk-lib';
import { EcsStack } from '../lib/ecs-stack';

describe('EcsStack', () => {
  test('Creates ECS Cluster with Container Insights', () => {
    const app = new cdk.App();
    const stack = new EcsStack(app, 'TestStack');

    const template = Template.fromStack(stack);

    // Assert ECS cluster has Container Insights enabled
    template.hasResourceProperties('AWS::ECS::Cluster', {
      ClusterSettings: [
        {
          Name: 'containerInsights',
          Value: 'enabled',
        },
      ],
    });
  });

  test('Creates Fargate service with correct CPU and memory', () => {
    const app = new cdk.App();
    const stack = new EcsStack(app, 'TestStack');

    const template = Template.fromStack(stack);

    template.hasResourceProperties('AWS::ECS::TaskDefinition', {
      Cpu: '1024',
      Memory: '2048',
      RequiresCompatibilities: ['FARGATE'],
    });
  });

  test('Configures auto-scaling with min 3 and max 10 tasks', () => {
    const app = new cdk.App();
    const stack = new EcsStack(app, 'TestStack');

    const template = Template.fromStack(stack);

    template.hasResourceProperties('AWS::ApplicationAutoScaling::ScalableTarget', {
      MinCapacity: 3,
      MaxCapacity: 10,
    });
  });
});

Unit tests validate infrastructure configuration before deployment. This catches errors early (e.g., forgetting to enable encryption, incorrect instance types) that would otherwise only surface during deployment.

Run tests:

npm test

See AWS Compute for ECS configuration details.

GitLab CI/CD Integration

Infrastructure deployments follow the same CI/CD patterns as application code: automated testing, peer review, approval gates, and rollback capabilities.

Terraform Pipeline

# .gitlab-ci.yml for Terraform
stages:
  - validate
  - plan
  - apply

variables:
  TF_VERSION: "1.5.7"
  TF_ROOT: ${CI_PROJECT_DIR}/environments/prod

# Validate Terraform configuration
terraform:validate:
  stage: validate
  image:
    name: hashicorp/terraform:${TF_VERSION}
    entrypoint: [""]
  before_script:
    - cd ${TF_ROOT}
    - terraform init -backend=false  # No state for validation
  script:
    - terraform fmt -check  # Check formatting
    - terraform validate    # Validate syntax
    - tflint               # Lint for errors/warnings
  only:
    - merge_requests
    - main

# Generate and save plan
terraform:plan:
  stage: plan
  image:
    name: hashicorp/terraform:${TF_VERSION}
    entrypoint: [""]
  before_script:
    - cd ${TF_ROOT}
    - terraform init
  script:
    - terraform plan -out=tfplan
    - terraform show -json tfplan > plan.json
  artifacts:
    paths:
      - ${TF_ROOT}/tfplan
      - ${TF_ROOT}/plan.json
    expire_in: 1 week
  only:
    - main

# Apply changes (manual gate)
terraform:apply:
  stage: apply
  image:
    name: hashicorp/terraform:${TF_VERSION}
    entrypoint: [""]
  before_script:
    - cd ${TF_ROOT}
    - terraform init
  script:
    - terraform apply -auto-approve tfplan
  dependencies:
    - terraform:plan
  when: manual  # Require manual approval
  only:
    - main
  environment:
    name: production
    action: deploy

Pipeline flow:

Key features:

Validation on MRs: Every merge request runs terraform validate and tflint, catching errors before merge
Plan on main branch: After merge, pipeline generates plan showing exactly what will change
Manual approval: Engineer reviews plan, clicks "Apply" button in GitLab to proceed
Artifact storage: Plan is saved as artifact, ensuring apply executes the reviewed plan (not a new plan)

AWS authentication in GitLab CI:

# Use OIDC federation (no long-lived credentials)
terraform:plan:
  id_tokens:
    GITLAB_OIDC_TOKEN:
      aud: https://gitlab.com
  before_script:
    - |
      export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
      $(aws sts assume-role-with-web-identity \
      --role-arn ${AWS_ROLE_ARN} \
      --role-session-name "gitlab-${CI_PROJECT_ID}-${CI_PIPELINE_ID}" \
      --web-identity-token ${GITLAB_OIDC_TOKEN} \
      --duration-seconds 3600 \
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
      --output text))

OIDC federation allows GitLab to assume an AWS IAM role without storing AWS credentials in GitLab. GitLab generates a short-lived token, AWS verifies it, and issues temporary credentials. See AWS IAM for OIDC setup.

For comprehensive CI/CD patterns, caching strategies, and deployment workflows, see GitLab CI/CD Pipelines.

CloudFormation Pipeline

# .gitlab-ci.yml for CloudFormation
stages:
  - validate
  - deploy

variables:
  STACK_NAME: prod-vpc
  TEMPLATE_FILE: templates/vpc-stack.yaml
  PARAMETERS_FILE: parameters/prod-params.json

cloudformation:validate:
  stage: validate
  image: python:3.11
  before_script:
    - pip install cfn-lint
  script:
    - cfn-lint ${TEMPLATE_FILE}
  only:
    - merge_requests
    - main

cloudformation:deploy:
  stage: deploy
  image:
    name: amazon/aws-cli:latest
    entrypoint: [""]
  script:
    # Create change set
    - |
      aws cloudformation create-change-set \
        --stack-name ${STACK_NAME} \
        --change-set-name changeset-${CI_PIPELINE_ID} \
        --template-body file://${TEMPLATE_FILE} \
        --parameters file://${PARAMETERS_FILE} \
        --capabilities CAPABILITY_NAMED_IAM

    # Wait for change set creation
    - |
      aws cloudformation wait change-set-create-complete \
        --stack-name ${STACK_NAME} \
        --change-set-name changeset-${CI_PIPELINE_ID}

    # Describe changes
    - |
      aws cloudformation describe-change-set \
        --stack-name ${STACK_NAME} \
        --change-set-name changeset-${CI_PIPELINE_ID}

    # Execute change set
    - |
      aws cloudformation execute-change-set \
        --stack-name ${STACK_NAME} \
        --change-set-name changeset-${CI_PIPELINE_ID}

    # Wait for completion
    - |
      aws cloudformation wait stack-update-complete \
        --stack-name ${STACK_NAME}
  when: manual
  only:
    - main
  environment:
    name: production

CloudFormation change sets show exactly what will change before applying, similar to terraform plan.

CDK Pipeline

# .gitlab-ci.yml for CDK
stages:
  - test
  - synth
  - deploy

variables:
  CDK_VERSION: "2.100.0"

cdk:test:
  stage: test
  image: node:18
  before_script:
    - npm ci
  script:
    - npm test
  coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
  only:
    - merge_requests
    - main

cdk:synth:
  stage: synth
  image: node:18
  before_script:
    - npm ci
    - npm install -g aws-cdk@${CDK_VERSION}
  script:
    - cdk synth
  artifacts:
    paths:
      - cdk.out/
    expire_in: 1 week
  only:
    - main

cdk:deploy:
  stage: deploy
  image: node:18
  before_script:
    - npm ci
    - npm install -g aws-cdk@${CDK_VERSION}
  script:
    - cdk deploy --require-approval never --all
  dependencies:
    - cdk:synth
  when: manual
  only:
    - main
  environment:
    name: production

CDK deployment includes unit testing (validate infrastructure logic) before synthesis and deployment.

State Management Best Practices

Infrastructure state contains the mapping between your configuration and actual cloud resources. Protecting and managing state is critical.

Encryption at Rest

Terraform:

# backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true  # Server-side encryption
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/abcd-1234"
  }
}

CloudFormation: State managed by AWS, encrypted by default.

CDK: Uses CloudFormation, encrypted by default.

Access Control

State files contain sensitive data (resource IDs, IP addresses, sometimes passwords). Restrict access:

// S3 bucket policy for Terraform state
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowTerraformRole",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/TerraformExecutionRole"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::mycompany-terraform-state/*"
    },
    {
      "Sid": "DenyUnencryptedUploads",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::mycompany-terraform-state/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "aws:kms"
        }
      }
    }
  ]
}

Only the Terraform execution role (used by CI/CD pipelines) can read/write state. Individual developers should not have direct S3 access to state.

State Locking

Terraform: DynamoDB table prevents concurrent modifications (configured in backend).

CloudFormation: AWS handles locking automatically (stack updates are serialized).

CDK: Uses CloudFormation, automatic locking.

Versioning and Backup

Enable S3 versioning on Terraform state bucket:

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

If terraform apply corrupts state, revert to previous version:

# List versions
aws s3api list-object-versions --bucket mycompany-terraform-state --prefix prod/terraform.tfstate

# Restore previous version
aws s3api copy-object \
  --bucket mycompany-terraform-state \
  --copy-source mycompany-terraform-state/prod/terraform.tfstate?versionId=VERSION_ID \
  --key prod/terraform.tfstate

For comprehensive state management strategies including workspace usage and multi-environment patterns, see Terraform State Management.

Testing Infrastructure Code

Terraform Testing

1. Static validation:

terraform fmt -check  # Check formatting
terraform validate    # Syntax validation
tflint                # AWS-specific linting

2. Policy validation with OPA:

# policy/encryption.rego
package terraform.encryption

# Deny S3 buckets without encryption
deny[msg] {
  resource := input.resource.aws_s3_bucket[name]
  not resource.server_side_encryption_configuration
  msg := sprintf("S3 bucket '%s' must have encryption enabled", [name])
}

# Deny RDS instances without encryption
deny[msg] {
  resource := input.resource.aws_db_instance[name]
  resource.storage_encrypted != true
  msg := sprintf("RDS instance '%s' must have storage encryption enabled", [name])
}

Run policy checks:

# Convert plan to JSON
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Validate with OPA
opa exec --decision terraform/encryption/deny --bundle policy/ plan.json

OPA policies enforce organizational standards (encryption required, public access blocked, tagging requirements) automatically in CI/CD.

3. Integration testing with Terratest:

See Terraform Testing for Terratest examples.

CloudFormation Testing

Static validation:

# Validate template syntax
aws cloudformation validate-template --template-body file://template.yaml

# Lint with cfn-lint
pip install cfn-lint
cfn-lint template.yaml

# Security scanning with cfn-nag
gem install cfn-nag
cfn_nag_scan --input-path template.yaml

cfn-nag detects security issues:

Failures count: 2
Warnings count: 1

Failures:
- Resources: ["PublicSubnet"]
  Message: Subnet should not map public IPs on launch

- Resources: ["SecurityGroup"]
  Message: Security group should not allow ingress from 0.0.0.0/0

CDK Testing

Unit tests (validate infrastructure logic):

See CDK unit testing example above.

Integration tests (deploy to test account):

// test/integration.test.ts
import { execSync } from 'child_process';

describe('CDK Integration Test', () => {
  test('Deploy to test account', () => {
    // Deploy to isolated test account
    execSync('cdk deploy --all --require-approval never', {
      env: {
        ...process.env,
        AWS_PROFILE: 'test-account',
      },
    });

    // Run smoke tests against deployed infrastructure
    // (e.g., check ALB health, query RDS, etc.)

    // Destroy after testing
    execSync('cdk destroy --all --force', {
      env: {
        ...process.env,
        AWS_PROFILE: 'test-account',
      },
    });
  }, 600000); // 10 minute timeout
});

For comprehensive testing strategies, see Integration Testing.

Secrets in Infrastructure Code

Never hardcode secrets in Terraform, CloudFormation, or CDK. Use AWS Secrets Manager or SSM Parameter Store.

Terraform with Secrets Manager

# Create secret
resource "aws_secretsmanager_secret" "db_password" {
  name = "prod-db-password"
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id     = aws_secretsmanager_secret.db_password.id
  secret_string = random_password.db_password.result
}

# Use secret in RDS
resource "aws_db_instance" "main" {
  # ... other config ...
  password = aws_secretsmanager_secret_version.db_password.secret_string
}

# Application retrieves secret at runtime (not from Terraform)
output "db_secret_arn" {
  value       = aws_secretsmanager_secret.db_password.arn
  description = "ARN of database password secret (application retrieves value)"
}

Important: The secret value is in Terraform state (encrypted), but applications should retrieve secrets from Secrets Manager at runtime, not from Terraform outputs. This allows secret rotation without Terraform changes.

CloudFormation with Secrets Manager

Resources:
  DBPassword:
    Type: AWS::SecretsManager::Secret
    Properties:
      GenerateSecretString:
        PasswordLength: 32
        ExcludeCharacters: '"@/\'

  DBInstance:
    Type: AWS::RDS::DBInstance
    Properties:
      MasterUsername: admin
      MasterUserPassword: !Sub '{{resolve:secretsmanager:${DBPassword}:SecretString}}'
      # ... other config ...

CloudFormation's !Sub with {{resolve:secretsmanager:...}} retrieves the secret during stack creation/update without exposing it in template parameters.

CDK with Secrets Manager

import * as secretsmanager from 'aws-cdk-lib/aws-secretsmanager';

// Generate secret
const dbPassword = new secretsmanager.Secret(this, 'DBPassword', {
  generateSecretString: {
    passwordLength: 32,
    excludeCharacters: '"@/\\',
  },
});

// Use in RDS
const db = new rds.DatabaseInstance(this, 'Database', {
  // ... other config ...
  credentials: rds.Credentials.fromSecret(dbPassword),
});

See Secrets Management for comprehensive secret handling patterns including rotation, access control, and application integration.

Multi-Environment Management

Terraform: Separate State Files per Environment

terraform/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── backend.tf        # State: s3://tf-state/dev/terraform.tfstate
│   │   └── terraform.tfvars  # dev-specific values
│   ├── staging/
│   │   ├── main.tf
│   │   ├── backend.tf        # State: s3://tf-state/staging/terraform.tfstate
│   │   └── terraform.tfvars  # staging-specific values
│   └── prod/
│       ├── main.tf
│       ├── backend.tf        # State: s3://tf-state/prod/terraform.tfstate
│       └── terraform.tfvars  # prod-specific values

Each environment has isolated state, preventing accidental cross-environment changes. Variable files (.tfvars) contain environment-specific values:

# environments/dev/terraform.tfvars
environment         = "dev"
instance_type       = "t3.small"
database_size       = "db.t3.small"
multi_az_enabled    = false
backup_retention    = 7

# environments/prod/terraform.tfvars
environment         = "prod"
instance_type       = "r6g.xlarge"
database_size       = "db.r6g.2xlarge"
multi_az_enabled    = true
backup_retention    = 30

For workspace strategies and environment isolation patterns, see Terraform Multi-Environment.

CloudFormation: Parameter Files per Environment

# Deploy to dev
aws cloudformation deploy \
  --stack-name dev-vpc \
  --template-file vpc-stack.yaml \
  --parameter-overrides file://parameters/dev-params.json

# Deploy to prod
aws cloudformation deploy \
  --stack-name prod-vpc \
  --template-file vpc-stack.yaml \
  --parameter-overrides file://parameters/prod-params.json

CDK: Context Values per Environment

// bin/app.ts
const app = new cdk.App();

const env = app.node.tryGetContext('environment') || 'dev';
const config = app.node.tryGetContext(env);

new EcsStack(app, `${env}-ecs-stack`, {
  env: {
    account: config.account,
    region: config.region,
  },
  instanceType: config.instanceType,
  desiredCount: config.desiredCount,
});

cdk.json:

{
  "context": {
    "dev": {
      "account": "111111111111",
      "region": "us-east-1",
      "instanceType": "t3.small",
      "desiredCount": 1
    },
    "prod": {
      "account": "222222222222",
      "region": "us-east-1",
      "instanceType": "r6g.large",
      "desiredCount": 3
    }
  }
}

Deploy:

cdk deploy --context environment=dev
cdk deploy --context environment=prod

Common Anti-Patterns

Hardcoded Values

Problem: Hardcoding account IDs, regions, IP addresses makes infrastructure non-portable.

# BAD
resource "aws_s3_bucket" "logs" {
  bucket = "mycompany-logs-us-east-1-123456789012"  # Hardcoded account ID
}

Fix: Use data sources and variables:

# GOOD
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

resource "aws_s3_bucket" "logs" {
  bucket = "mycompany-logs-${data.aws_region.current.name}-${data.aws_caller_identity.current.account_id}"
}

No State Locking

Problem: Running Terraform without DynamoDB locking allows concurrent modifications.

Fix: Always configure state locking in backend (see State Management above).

Monolithic Templates

Problem: Single massive Terraform/CloudFormation file containing all infrastructure.

Fix: Modularize:

Terraform: Use modules (see Terraform Modules)
CloudFormation: Use nested stacks
CDK: Use multiple stack classes

Ignoring Drift

Problem: Manual changes made via AWS console, never reconciled with IaC.

Fix:

Run terraform plan / CloudFormation drift detection regularly
Remediate drift by updating IaC, not by reverting manual changes
Enforce "no manual changes" policy via IAM restrictions (only allow CI/CD role to modify resources)

No Testing

Problem: Deploying infrastructure changes without validation.

Fix:

Static validation (terraform validate, cfn-lint)
Policy validation (OPA)
Integration tests (deploy to test account, validate, destroy)
See Testing Infrastructure Code

Secrets in Code

Problem: Storing secrets in .tf files or CloudFormation templates.

Fix: Use AWS Secrets Manager (see Secrets in Infrastructure Code).

Summary

Infrastructure as Code on AWS transforms manual, error-prone infrastructure management into versioned, tested, automated workflows. Key principles:

Choose the right tool: Terraform for multi-cloud, CloudFormation for AWS-native, CDK for code-first approach
Remote state: Store Terraform state in S3 with DynamoDB locking; CloudFormation/CDK state is AWS-managed
Modularization: Break infrastructure into reusable modules/stacks/constructs
CI/CD integration: Automate validation, planning, and deployment through GitLab pipelines
Testing: Validate syntax, enforce policies (OPA), run integration tests
Secrets: Never hardcode secrets; use Secrets Manager or Parameter Store
Multi-environment: Isolated state per environment, environment-specific variables
Drift detection: Regularly check for manual changes, remediate via IaC

When to use each tool:

Terraform: Multi-cloud, mature ecosystem, team already knows Terraform
CloudFormation: AWS-only, zero setup, deep AWS integration
CDK: Developers prefer code over config, need type safety and high-level constructs

For comprehensive Terraform coverage including advanced module patterns, workspace strategies, testing with Terratest, and state management, see Terraform Best Practices.

IaC Tool Comparison​

Terraform vs CloudFormation vs AWS CDK​

When to Use Terraform​

When to Use CloudFormation​

When to Use AWS CDK​

Terraform on AWS​

AWS Provider Configuration​

AWS Resource Examples​

VPC with Multi-AZ Subnets​

EKS Cluster with IRSA​

RDS with Automated Backups​

Terraform Modules for AWS​

CloudFormation​

CloudFormation Template Structure​

Nested Stacks​

StackSets for Multi-Account Deployment​

Drift Detection​

AWS CDK​

CDK Project Structure​

ECS Fargate Service with CDK​

CDK Unit Testing​

GitLab CI/CD Integration​

Terraform Pipeline​

CloudFormation Pipeline​

CDK Pipeline​

State Management Best Practices​

Encryption at Rest​

Access Control​

State Locking​

Versioning and Backup​

Testing Infrastructure Code​

Terraform Testing​

CloudFormation Testing​

CDK Testing​

Secrets in Infrastructure Code​

Terraform with Secrets Manager​

CloudFormation with Secrets Manager​

CDK with Secrets Manager​

Multi-Environment Management​

Terraform: Separate State Files per Environment​

CloudFormation: Parameter Files per Environment​

CDK: Context Values per Environment​

Common Anti-Patterns​

Hardcoded Values​

No State Locking​

Monolithic Templates​

Ignoring Drift​

No Testing​

Secrets in Code​

Summary​

Further Reading​