AWS CLI and Automation

Using the AWS Command Line Interface for automation, scripting, and CI/CD pipeline integration.

Overview

The AWS CLI is a unified command-line tool for managing AWS services. It enables automation of deployment workflows, infrastructure operations, and debugging tasks that would be tedious through the AWS Console. This guide covers CLI fundamentals, CI/CD integration patterns, and practical automation scripts for common operations.

While the AWS SDK (covered in AWS SDK Integration) is preferred for application code, the CLI excels at scripting, CI/CD pipelines, and ad-hoc operations. Understanding when to use each tool is essential for efficient AWS automation.

Core Principles

Credential security: Never hardcode credentials; use IAM roles, OIDC federation, or AWS SSO
Idempotency: Scripts should safely run multiple times without unintended side effects
Error handling: Check exit codes and handle failures gracefully with retries where appropriate
Output parsing: Use --query and jq for reliable JSON parsing instead of regex on text output
Automation-first: Design CLI scripts to run unattended in CI/CD pipelines without manual intervention

Installation and Configuration

Installing AWS CLI v2

AWS CLI v2 is the current version with improved performance and user experience over v1. Always use v2 for new projects.

Linux/macOS:

# Download and install
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

# Verify installation
aws --version
# aws-cli/2.15.30 Python/3.11.8 Linux/5.15.0-1047-aws exe/x86_64.ubuntu.22

Windows:

# Download MSI installer from https://awscli.amazonaws.com/AWSCLIV2.msi
# Run installer, then verify
aws --version

Docker (for CI/CD):

FROM amazon/aws-cli:2.15.30
# Pre-installed AWS CLI in container

Basic Configuration

The AWS CLI uses a configuration file (~/.aws/config) and credentials file (~/.aws/credentials) to store settings.

Configure interactively:

aws configure
# AWS Access Key ID: AKIAIOSFODNN7EXAMPLE
# AWS Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Default region name: us-east-1
# Default output format: json

This creates two files:

~/.aws/credentials:

[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

~/.aws/config:

[default]
region = us-east-1
output = json

Important: The interactive configuration approach stores long-term credentials in plaintext files. This is acceptable for local development by human users, but never use this for applications or CI/CD. Use IAM roles and temporary credentials instead (covered below).

Named Profiles

Profiles enable managing multiple AWS accounts or environments from the same machine.

Configure multiple profiles:

aws configure --profile production
aws configure --profile staging

~/.aws/config:

[default]
region = us-east-1

[profile production]
region = us-east-1
output = json

[profile staging]
region = us-west-2
output = json

Use specific profile:

# Specify profile per command
aws s3 ls --profile production

# Or set environment variable for session
export AWS_PROFILE=production
aws s3 ls  # Uses production profile

For multi-account strategies and cross-account access patterns, see AWS IAM.

Credential Precedence

The AWS CLI searches for credentials in this order (highest priority first):

Command line options: --profile, --region
Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
Web identity token: From environment variables (used by EKS pods via IRSA)
Shared credentials file: ~/.aws/credentials
Shared config file: ~/.aws/config
Container credentials: ECS task role (from AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)
Instance metadata: EC2 instance profile (from IMDS)

This precedence chain enables secure credential management: local development uses ~/.aws/credentials, CI/CD uses environment variables or OIDC, and production services use IAM roles via instance profiles or task roles.

Understanding credential precedence helps debug authentication issues: if a script works locally but fails in CI/CD, it's often because the credential source changed (e.g., from credentials file to environment variables).

AWS CLI vs AWS SDK

Both tools interact with AWS APIs, but they serve different purposes:

Aspect	AWS CLI	AWS SDK
Use case	Scripting, automation, CI/CD, ad-hoc operations	Application code, complex logic, production services
Language	Bash/shell scripts	Java, JavaScript, Python, Go, etc.
Error handling	Exit codes, stderr	Try-catch blocks, typed exceptions
Type safety	None (strings only)	Full type checking (in typed languages)
Performance	Process invocation overhead	In-process library calls
Debugging	`--debug` flag	Application logging, debugger
Best for	Deployments, infrastructure ops, debugging	Business logic, API integrations, data processing

Use CLI when:

Automating deployments in GitLab CI/CD pipelines
Writing operational scripts (backups, migrations, maintenance)
Ad-hoc debugging and troubleshooting
Rapid prototyping before SDK implementation

Use SDK when:

Implementing application business logic (see AWS SDK Integration)
Need type safety and IDE autocomplete
Complex error handling and retry logic
Performance-critical operations (avoid process overhead)

Common CLI Patterns

Output Formats and Filtering

The CLI supports three output formats: json, text, and table. JSON is the most reliable for scripting because it's machine-parseable.

JSON output (default):

aws ec2 describe-instances
{
  "Reservations": [
    {
      "Instances": [
        {
          "InstanceId": "i-1234567890abcdef0",
          "State": {"Name": "running"},
          "PrivateIpAddress": "10.0.1.5"
        }
      ]
    }
  ]
}

Table output (human-readable):

aws ec2 describe-instances --output table
-------------------------------------------------------
|                 DescribeInstances                   |
+-----------------------------------------------------+
||              Instances                            ||
|+----------------+----------------------------------+|
|| InstanceId     | i-1234567890abcdef0              ||
|| State          | running                          ||
|| PrivateIp      | 10.0.1.5                         ||
|+----------------+----------------------------------+|

Text output (space-delimited):

aws ec2 describe-instances --output text
RESERVATIONS  r-1234567890abcdef0
INSTANCES     i-1234567890abcdef0  running  10.0.1.5

Filtering with --query (JMESPath)

The --query parameter filters JSON output using JMESPath expressions. This is more reliable than piping to grep because it operates on parsed JSON structure.

Example: Get all running instance IDs

aws ec2 describe-instances \
  --query 'Reservations[*].Instances[?State.Name==`running`].InstanceId' \
  --output text

i-abc123  i-def456  i-ghi789

JMESPath basics:

[*] - flatten array (all elements)
[?condition] - filter array elements
.field - access object property
[0] - access first element

Example: Get instance ID and private IP

aws ec2 describe-instances \
  --query 'Reservations[*].Instances[*].[InstanceId,PrivateIpAddress]' \
  --output text

i-abc123  10.0.1.5
i-def456  10.0.1.6

Example: Filter by tag

aws ec2 describe-instances \
  --query 'Reservations[*].Instances[?Tags[?Key==`Environment` && Value==`production`]].InstanceId' \
  --output text

Parsing with jq

For complex JSON transformations beyond --query capabilities, pipe to jq:

Example: Extract nested data

aws ecs describe-services --cluster prod --services payment-api | \
  jq -r '.services[0].deployments[] | "\(.status): \(.desiredCount) tasks"'

PRIMARY: 3 tasks

Example: Create CSV output

aws s3api list-objects --bucket my-bucket | \
  jq -r '.Contents[] | [.Key, .Size, .LastModified] | @csv'

"file1.txt",1024,"2025-01-15T10:30:00Z"
"file2.txt",2048,"2025-01-15T11:00:00Z"

Example: Conditional formatting

aws cloudwatch describe-alarms | \
  jq -r '.MetricAlarms[] | select(.StateValue == "ALARM") | .AlarmName'

For more jq patterns, see the jq manual.

Pagination

Many AWS APIs return paginated results. Without handling pagination, you only see the first page (typically 100-1000 items).

Manual pagination (fragile):

aws s3api list-objects --bucket my-bucket --max-items 100
# Only returns first 100 objects

Automatic pagination:

aws s3api list-objects --bucket my-bucket --page-size 100
# CLI automatically fetches all pages, returns complete result set

What's happening:

--max-items: Limits total output (CLI stops after N items)
--page-size: Controls API call size (fetches N items per request, continues until all retrieved)

For scripting, always use --page-size to ensure you process all results. Smaller page sizes reduce per-request memory but increase total API calls.

Error Handling

The CLI returns exit code 0 on success, non-zero on failure. Always check exit codes in scripts.

Basic error handling:

if aws s3 cp myfile.txt s3://my-bucket/; then
  echo "Upload successful"
else
  echo "Upload failed with exit code $?"
  exit 1
fi

Capture stderr:

ERROR_OUTPUT=$(aws s3 cp myfile.txt s3://my-bucket/ 2>&1)
if [ $? -ne 0 ]; then
  echo "Error: $ERROR_OUTPUT"
  exit 1
fi

Retry with exponential backoff:

retry_count=0
max_retries=3
until aws s3 cp myfile.txt s3://my-bucket/; do
  retry_count=$((retry_count + 1))
  if [ $retry_count -ge $max_retries ]; then
    echo "Failed after $max_retries attempts"
    exit 1
  fi
  sleep_time=$((2 ** retry_count))  # 2, 4, 8 seconds
  echo "Retry $retry_count after $sleep_time seconds"
  sleep $sleep_time
done

For resilience patterns, see Spring Boot Resilience.

GitLab CI/CD Integration

The CLI is commonly used in GitLab CI/CD pipelines for deploying to AWS. Secure authentication is critical - never hardcode credentials.

OIDC Authentication (Recommended)

OpenID Connect (OIDC) federation enables GitLab to assume AWS IAM roles without long-term credentials. GitLab issues a JWT token that AWS trusts, exchanging it for temporary credentials.

Why OIDC:

No credentials stored in GitLab (eliminates credential leakage risk)
Temporary credentials (auto-expire after session)
Auditable (CloudTrail logs all AssumeRoleWithWebIdentity calls)
Principle of least privilege (role permissions scope exactly what pipeline needs)

Setup: Create OIDC identity provider in AWS

# Terraform configuration
resource "aws_iam_openid_connect_provider" "gitlab" {
  url = "https://gitlab.com"

  client_id_list = [
    "https://gitlab.com"
  ]

  # GitLab's TLS certificate thumbprint
  thumbprint_list = [
    "b3dd7606d2b5a8b4a13771dbecc9ee1cecafa38a"  # GitLab.com thumbprint
  ]
}

resource "aws_iam_role" "gitlab_deploy" {
  name = "gitlab-deploy-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.gitlab.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "gitlab.com:sub" = "project_path:my-org/my-project:ref_type:branch:ref:main"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy" "deploy_permissions" {
  role = aws_iam_role.gitlab_deploy.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ecr:GetAuthorizationToken",
          "ecr:BatchCheckLayerAvailability",
          "ecr:PutImage",
          "ecr:InitiateLayerUpload",
          "ecr:UploadLayerPart",
          "ecr:CompleteLayerUpload"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "ecs:UpdateService",
          "ecs:DescribeServices"
        ]
        Resource = "arn:aws:ecs:us-east-1:123456789012:service/prod-cluster/payment-api"
      }
    ]
  })
}

GitLab CI/CD pipeline using OIDC:

deploy:
  stage: deploy
  image:
    name: amazon/aws-cli:2.15.30
    entrypoint: [""]
  id_tokens:
    GITLAB_OIDC_TOKEN:
      aud: https://gitlab.com
  before_script:
    # Assume AWS role using GitLab OIDC token
    - >
      export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s"
      $(aws sts assume-role-with-web-identity
      --role-arn arn:aws:iam::123456789012:role/gitlab-deploy-role
      --role-session-name "gitlab-${CI_PIPELINE_ID}"
      --web-identity-token ${GITLAB_OIDC_TOKEN}
      --duration-seconds 3600
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
      --output text))
  script:
    # Now AWS CLI uses temporary credentials from assumed role
    - aws sts get-caller-identity  # Verify authentication
    - aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
    - docker push $ECR_REGISTRY/payment-api:${CI_COMMIT_SHA}
    - aws ecs update-service --cluster prod-cluster --service payment-api --force-new-deployment
  only:
    - main

What's happening:

GitLab issues JWT token (GITLAB_OIDC_TOKEN) identifying the pipeline
Pipeline calls sts assume-role-with-web-identity with the token
AWS validates token against OIDC provider, returns temporary credentials
Export credentials as environment variables
All subsequent AWS CLI calls use temporary credentials

For IAM role patterns and policy design, see AWS IAM. For complete pipeline patterns, see CI/CD Pipelines.

Environment Variables (Alternative)

For simpler setups (or when OIDC isn't available), store IAM user credentials as GitLab CI/CD variables:

GitLab Settings → CI/CD → Variables:

AWS_ACCESS_KEY_ID (protected, masked)
AWS_SECRET_ACCESS_KEY (protected, masked)
AWS_DEFAULT_REGION

Pipeline configuration:

deploy:
  stage: deploy
  image: amazon/aws-cli:2.15.30
  script:
    - aws sts get-caller-identity  # Verify credentials work
    - aws s3 sync ./build s3://my-bucket/
  only:
    - main

Important: This requires creating an IAM user with long-term credentials. OIDC is preferred because it eliminates credential storage and provides automatic rotation.

Service-Specific Operations

ECR (Elastic Container Registry)

ECR stores Docker images. Common operations include authentication, pushing images, and lifecycle management.

Authenticate Docker to ECR:

aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

Create repository:

aws ecr create-repository --repository-name payment-api --region us-east-1

Tag and push image:

docker tag payment-api:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/payment-api:v1.2.3
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/payment-api:v1.2.3

List images:

aws ecr describe-images --repository-name payment-api --region us-east-1 \
  --query 'imageDetails[*].[imageTags[0],imagePushedAt]' --output table

Scan image for vulnerabilities:

# Trigger scan
aws ecr start-image-scan --repository-name payment-api --image-id imageTag=v1.2.3

# Wait for scan completion
aws ecr wait image-scan-complete --repository-name payment-api --image-id imageTag=v1.2.3

# Get scan results
aws ecr describe-image-scan-findings --repository-name payment-api --image-id imageTag=v1.2.3 \
  --query 'imageScanFindings.findingSeverityCounts'

Delete old images (lifecycle management):

# Get images older than 30 days
aws ecr describe-images --repository-name payment-api \
  --query 'imageDetails[?imagePushedAt<`'$(date -d '30 days ago' --iso-8601)'`].[imageDigest]' \
  --output text | \
while read digest; do
  aws ecr batch-delete-image --repository-name payment-api --image-ids imageDigest=$digest
done

For container image best practices, see Docker Guidelines.

ECS (Elastic Container Service)

ECS runs Docker containers. Common operations include updating services and checking deployment status.

Update service (force new deployment):

aws ecs update-service \
  --cluster prod-cluster \
  --service payment-api \
  --force-new-deployment

Update service with new image:

# Get current task definition
TASK_DEF=$(aws ecs describe-services --cluster prod-cluster --services payment-api \
  --query 'services[0].taskDefinition' --output text)

# Register new task definition with updated image
NEW_TASK_DEF=$(aws ecs describe-task-definition --task-definition $TASK_DEF \
  --query 'taskDefinition' | \
  jq '.containerDefinitions[0].image="123456789012.dkr.ecr.us-east-1.amazonaws.com/payment-api:v1.2.3"' | \
  jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)' | \
  aws ecs register-task-definition --cli-input-json file:///dev/stdin \
  --query 'taskDefinition.taskDefinitionArn' --output text)

# Update service to use new task definition
aws ecs update-service --cluster prod-cluster --service payment-api --task-definition $NEW_TASK_DEF

Check deployment status:

aws ecs describe-services --cluster prod-cluster --services payment-api \
  --query 'services[0].deployments[*].[status,desiredCount,runningCount]' \
  --output table

Wait for deployment completion:

aws ecs wait services-stable --cluster prod-cluster --services payment-api
echo "Deployment completed successfully"

List running tasks:

aws ecs list-tasks --cluster prod-cluster --service-name payment-api \
  --desired-status RUNNING --output text --query 'taskArns[*]'

Get task logs (via CloudWatch):

# Get task details to find log stream
TASK_ID=$(aws ecs list-tasks --cluster prod-cluster --service-name payment-api --desired-status RUNNING \
  --query 'taskArns[0]' --output text | awk -F/ '{print $NF}')

# Fetch recent logs
aws logs tail /ecs/prod-cluster/payment-api --since 10m --follow --format short

For ECS deployment patterns, see AWS Compute.

EKS (Elastic Kubernetes Service)

EKS runs Kubernetes clusters. The CLI primarily configures kubectl access.

Update kubeconfig for cluster access:

aws eks update-kubeconfig --name prod-cluster --region us-east-1

# Verify connectivity
kubectl get nodes

This command:

Fetches cluster endpoint and certificate
Creates/updates ~/.kube/config with cluster details
Configures aws eks get-token as authentication provider
Enables kubectl to authenticate via IAM

Get cluster status:

aws eks describe-cluster --name prod-cluster --query 'cluster.status'

List node groups:

aws eks list-nodegroups --cluster-name prod-cluster --output table

Scale node group:

aws eks update-nodegroup-config \
  --cluster-name prod-cluster \
  --nodegroup-name general-purpose \
  --scaling-config desiredSize=5,minSize=3,maxSize=10

For Kubernetes operations, use kubectl after configuring access. See Kubernetes Guidelines and AWS EKS for cluster management patterns.

S3 (Simple Storage Service)

S3 stores objects (files). The CLI provides high-level commands (s3) and low-level API commands (s3api).

Upload file:

aws s3 cp myfile.txt s3://my-bucket/path/to/myfile.txt

Upload directory recursively:

aws s3 sync ./dist s3://my-bucket/static/ --delete
# --delete removes files in S3 not present locally (keeps S3 in sync)

Download file:

aws s3 cp s3://my-bucket/data.json ./data.json

List bucket contents:

aws s3 ls s3://my-bucket/logs/ --recursive --human-readable --summarize

Generate presigned URL (temporary download link):

aws s3 presign s3://my-bucket/private-file.pdf --expires-in 3600
# Returns URL valid for 1 hour

Set object ACL:

aws s3api put-object-acl --bucket my-bucket --key file.txt --acl private

Enable versioning:

aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled

Lifecycle policy (delete old versions):

aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json

lifecycle.json:

{
  "Rules": [{
    "Id": "DeleteOldVersions",
    "Status": "Enabled",
    "NoncurrentVersionExpiration": {
      "NoncurrentDays": 90
    }
  }]
}

For comprehensive S3 patterns, see File Storage Guidelines.

RDS (Relational Database Service)

RDS manages relational databases. CLI operations include backups, snapshots, and instance management.

Create snapshot:

aws rds create-db-snapshot \
  --db-instance-identifier prod-db \
  --db-snapshot-identifier prod-db-$(date +%Y%m%d-%H%M%S)

Restore from snapshot:

aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier restored-db \
  --db-snapshot-identifier prod-db-20250115-100000

Modify instance (change instance type):

aws rds modify-db-instance \
  --db-instance-identifier prod-db \
  --db-instance-class db.r6g.xlarge \
  --apply-immediately

Get connection endpoint:

aws rds describe-db-instances --db-instance-identifier prod-db \
  --query 'DBInstances[0].[Endpoint.Address,Endpoint.Port]' --output text

For database patterns, see AWS Databases.

Lambda

Lambda executes serverless functions. CLI operations include invocation and deployment.

Invoke function:

aws lambda invoke \
  --function-name payment-processor \
  --payload '{"customerId": "123", "amount": 100}' \
  --cli-binary-format raw-in-base64-out \
  response.json

cat response.json

Update function code:

# Package code
zip function.zip index.js

# Update Lambda function
aws lambda update-function-code \
  --function-name payment-processor \
  --zip-file fileb://function.zip

Update environment variables:

aws lambda update-function-configuration \
  --function-name payment-processor \
  --environment Variables="{DB_HOST=db.example.com,LOG_LEVEL=INFO}"

Get function logs:

aws logs tail /aws/lambda/payment-processor --follow

For Lambda patterns, see AWS Compute.

Automation Patterns

Deployment Script Template

deploy.sh:

#!/bin/bash
set -euo pipefail  # Exit on error, undefined variable, pipe failure

# Configuration
CLUSTER="prod-cluster"
SERVICE="payment-api"
IMAGE_TAG="${1:-latest}"  # Default to 'latest' if no argument
ECR_REGISTRY="123456789012.dkr.ecr.us-east-1.amazonaws.com"
IMAGE="${ECR_REGISTRY}/payment-api:${IMAGE_TAG}"

echo "Deploying ${SERVICE} with image ${IMAGE}"

# Authenticate to ECR
echo "Authenticating to ECR..."
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin "${ECR_REGISTRY}"

# Push image
echo "Pushing image..."
docker tag payment-api:latest "${IMAGE}"
docker push "${IMAGE}"

# Update task definition
echo "Updating task definition..."
TASK_FAMILY="${SERVICE}"
CURRENT_TASK_DEF=$(aws ecs describe-task-definition --task-definition "${TASK_FAMILY}" --query 'taskDefinition')

NEW_TASK_DEF=$(echo "${CURRENT_TASK_DEF}" | \
  jq --arg IMAGE "${IMAGE}" '.containerDefinitions[0].image=$IMAGE' | \
  jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')

NEW_TASK_ARN=$(echo "${NEW_TASK_DEF}" | \
  aws ecs register-task-definition --cli-input-json file:///dev/stdin \
  --query 'taskDefinition.taskDefinitionArn' --output text)

echo "Registered new task definition: ${NEW_TASK_ARN}"

# Update service
echo "Updating service..."
aws ecs update-service \
  --cluster "${CLUSTER}" \
  --service "${SERVICE}" \
  --task-definition "${NEW_TASK_ARN}" \
  --query 'service.serviceName' \
  --output text

# Wait for deployment
echo "Waiting for deployment to complete..."
aws ecs wait services-stable --cluster "${CLUSTER}" --services "${SERVICE}"

# Verify deployment
RUNNING_COUNT=$(aws ecs describe-services --cluster "${CLUSTER}" --services "${SERVICE}" \
  --query 'services[0].runningCount' --output text)

DESIRED_COUNT=$(aws ecs describe-services --cluster "${CLUSTER}" --services "${SERVICE}" \
  --query 'services[0].desiredCount' --output text)

if [ "${RUNNING_COUNT}" -eq "${DESIRED_COUNT}" ]; then
  echo "✓ Deployment successful: ${RUNNING_COUNT}/${DESIRED_COUNT} tasks running"
  exit 0
else
  echo "✗ Deployment verification failed: ${RUNNING_COUNT}/${DESIRED_COUNT} tasks running"
  exit 1
fi

Usage:

./deploy.sh v1.2.3

Backup Automation

backup-rds.sh:

#!/bin/bash
set -euo pipefail

DB_INSTANCE="prod-db"
SNAPSHOT_PREFIX="automated-backup"
RETENTION_DAYS=30

# Create snapshot
SNAPSHOT_ID="${SNAPSHOT_PREFIX}-$(date +%Y%m%d-%H%M%S)"
echo "Creating snapshot: ${SNAPSHOT_ID}"

aws rds create-db-snapshot \
  --db-instance-identifier "${DB_INSTANCE}" \
  --db-snapshot-identifier "${SNAPSHOT_ID}" \
  --tags Key=Type,Value=AutomatedBackup Key=CreatedAt,Value="$(date --iso-8601=seconds)"

# Wait for snapshot completion
echo "Waiting for snapshot to complete..."
aws rds wait db-snapshot-completed --db-snapshot-identifier "${SNAPSHOT_ID}"

echo "✓ Snapshot created successfully"

# Delete old snapshots
echo "Cleaning up snapshots older than ${RETENTION_DAYS} days..."
CUTOFF_DATE=$(date -d "${RETENTION_DAYS} days ago" --iso-8601)

aws rds describe-db-snapshots --db-instance-identifier "${DB_INSTANCE}" \
  --query "DBSnapshots[?starts_with(DBSnapshotIdentifier, '${SNAPSHOT_PREFIX}')].[DBSnapshotIdentifier,SnapshotCreateTime]" \
  --output text | \
while read -r snapshot_id snapshot_time; do
  if [[ "${snapshot_time}" < "${CUTOFF_DATE}" ]]; then
    echo "Deleting old snapshot: ${snapshot_id} (${snapshot_time})"
    aws rds delete-db-snapshot --db-snapshot-identifier "${snapshot_id}"
  fi
done

echo "✓ Backup completed"

Run via cron:

# Run daily at 2 AM
0 2 * * * /path/to/backup-rds.sh >> /var/log/backup-rds.log 2>&1

Multi-Account Operations

cross-account-deploy.sh:

#!/bin/bash
set -euo pipefail

# Assume role in target account
TARGET_ROLE="arn:aws:iam::999999999999:role/DeployRole"
SESSION_NAME="deploy-session-$(date +%s)"

echo "Assuming role: ${TARGET_ROLE}"
CREDENTIALS=$(aws sts assume-role \
  --role-arn "${TARGET_ROLE}" \
  --role-session-name "${SESSION_NAME}" \
  --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
  --output text)

# Export temporary credentials
export AWS_ACCESS_KEY_ID=$(echo "${CREDENTIALS}" | awk '{print $1}')
export AWS_SECRET_ACCESS_KEY=$(echo "${CREDENTIALS}" | awk '{print $2}')
export AWS_SESSION_TOKEN=$(echo "${CREDENTIALS}" | awk '{print $3}')

# Verify assumed identity
echo "Current identity:"
aws sts get-caller-identity

# Perform operations in target account
echo "Deploying to target account..."
aws ecs update-service --cluster prod-cluster --service payment-api --force-new-deployment

echo "✓ Cross-account deployment completed"

For cross-account IAM patterns, see AWS IAM.

Debugging and Troubleshooting

Debug Output

Enable verbose output to see API calls and responses:

aws s3 ls s3://my-bucket/ --debug 2> debug.log

What's included:

API endpoint URLs
Request headers and body
Response status and headers
Credential resolution steps
Retry attempts

Use cases:

Authentication failures (which credentials are being used?)
API errors (what's the exact error message?)
Performance issues (how many retries? network latency?)

Credential Verification

# Which credentials is CLI using?
aws sts get-caller-identity

# Output shows:
{
  "UserId": "AIDAI23HXW2EQ",
  "Account": "123456789012",
  "Arn": "arn:aws:iam::123456789012:user/developer"
}

This confirms:

Authentication is working
Which IAM principal is active
Which AWS account you're operating in

CloudTrail for Audit

Every AWS CLI operation creates a CloudTrail event. Use CloudTrail to debug permission issues or unexpected behavior.

Find who deleted an S3 object:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=DeleteObject \
  --max-results 10 \
  --query 'Events[*].[EventTime,Username,CloudTrailEvent]' \
  --output text

Find all ECS service updates:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateService \
  --start-time $(date -d '24 hours ago' --iso-8601) \
  --query 'Events[*].[EventTime,Username]'

For observability patterns, see AWS Observability.

Common Errors

"Unable to locate credentials"

Solution: Configure credentials via aws configure, environment variables, or IAM role

"Access Denied"

Solution: Check IAM policy attached to your user/role has required permissions
Debug: Run with --debug to see exact API call being denied
Verify: Use aws sts get-caller-identity to confirm which principal is active

"Rate exceeded"

Solution: AWS API rate limits exceeded, implement exponential backoff retry
Prevention: Reduce API call frequency, use pagination properly

"InvalidParameterException"

Solution: Check parameter format (e.g., ARNs must be fully qualified)
Debug: Run with --debug to see request payload

Best Practices

Security Best Practices

Never hardcode credentials: Use IAM roles, OIDC federation, or environment variables
Principle of least privilege: Grant only permissions required for specific operations
Use temporary credentials: Prefer sts assume-role over long-term access keys
Rotate access keys regularly: If IAM users are unavoidable, rotate keys every 90 days
Enable MFA for sensitive operations: Require MFA for production deployments

Scripting Best Practices

Use set -euo pipefail: Exit on errors, undefined variables, pipe failures
Check exit codes: Always validate command success before proceeding
Use --query for filtering: More reliable than regex on text output
Implement idempotency: Scripts should safely run multiple times
Add logging: Echo progress messages for debugging
Handle pagination: Use --page-size to fetch all results

Performance Best Practices

Batch operations: Use batch-delete-image instead of individual deletes
Parallel execution: Use xargs -P or background jobs for independent operations
Cache credentials: Assumed role credentials valid for 1 hour (don't re-assume every command)
Use local filtering: Filter with --query before downloading large result sets

Anti-Patterns

Security Anti-Patterns

Hardcoded credentials in scripts: Credentials leak via version control, logs, error messages
Root account usage: Root has unrestricted access; use IAM users/roles instead
Overly permissive policies: "Action": "*" on "Resource": "*" is never acceptable
Long-term credentials in CI/CD: Use OIDC federation instead of storing access keys

Scripting Anti-Patterns

Ignoring errors: Not checking exit codes leads to cascade failures
Parsing text output: Use JSON output and --query or jq for reliability
Missing pagination: Only processing first page of results
No retry logic: Network failures and rate limits require exponential backoff
Synchronous operations: Running serial commands when parallel execution possible

Operational Anti-Patterns

Manual deployments: Automate via CI/CD pipelines for consistency and auditability
No rollback plan: Always know how to revert deployments
Deploying untested changes: Test in lower environments first
No deployment verification: Check service health after deployment completes

Summary

The AWS CLI is a powerful tool for automation, scripting, and CI/CD integration:

Key Capabilities:

Unified interface for all AWS services
Credential management with IAM roles and OIDC federation
JSON output parsing with --query and jq
Pagination handling for large result sets
GitLab CI/CD integration for automated deployments

Common Operations:

ECR: Docker image push, vulnerability scanning, lifecycle management
ECS: Service updates, deployment verification, task management
EKS: Kubeconfig setup, cluster access configuration
S3: File upload/download, sync, presigned URLs
RDS: Snapshots, backups, instance management
Lambda: Function invocation, code deployment, log tailing

Best Practices:

Use OIDC federation for CI/CD (no stored credentials)
Implement error handling with retries and exponential backoff
Filter output with --query before piping to other tools
Handle pagination properly to process all results
Add idempotency to scripts for safe re-execution

When to Use:

GitLab CI/CD pipelines for deployments
Operational scripts (backups, migrations, cleanup)
Ad-hoc debugging and troubleshooting
Rapid prototyping before SDK implementation

When to Use SDK Instead:

Application business logic (see AWS SDK Integration)
Complex error handling and retry logic
Type-safe operations with IDE support
Performance-critical operations

Cross-References:

AWS IAM - IAM roles, policies, credential management
AWS SDK Integration - SDK usage in application code
CI/CD Pipelines - GitLab pipeline patterns
Docker Guidelines - Container image best practices
Kubernetes Guidelines - kubectl operations
AWS EKS - EKS cluster management
AWS Compute - ECS and Lambda patterns
File Storage Guidelines - S3 patterns

Further Reading:

Overview​

Core Principles​

Installation and Configuration​

Installing AWS CLI v2​

Basic Configuration​

Named Profiles​

Credential Precedence​

AWS CLI vs AWS SDK​

Common CLI Patterns​

Output Formats and Filtering​

Filtering with --query (JMESPath)​

Parsing with jq​

Pagination​

Error Handling​

GitLab CI/CD Integration​

OIDC Authentication (Recommended)​

Environment Variables (Alternative)​

Service-Specific Operations​

ECR (Elastic Container Registry)​

ECS (Elastic Container Service)​

EKS (Elastic Kubernetes Service)​

S3 (Simple Storage Service)​

RDS (Relational Database Service)​

Lambda​

Automation Patterns​

Deployment Script Template​

Backup Automation​

Multi-Account Operations​

Debugging and Troubleshooting​

Debug Output​

Credential Verification​

CloudTrail for Audit​

Common Errors​

Best Practices​

Security Best Practices​

Scripting Best Practices​

Performance Best Practices​

Anti-Patterns​

Security Anti-Patterns​

Scripting Anti-Patterns​

Operational Anti-Patterns​

Summary​