AWS Database Services

Overview

AWS provides a comprehensive portfolio of managed database services designed to eliminate the operational burden of database administration while delivering high performance, availability, and scalability. Choosing the right database service requires understanding your data model, access patterns, consistency requirements, and operational constraints.

This guide covers AWS's primary database offerings: Relational Database Service (RDS) for traditional RDBMS workloads, Aurora for cloud-native relational databases, DynamoDB for NoSQL key-value and document storage, and ElastiCache for in-memory caching. Understanding when to use each service and how to configure them optimally is critical for building scalable, cost-effective data architectures.

The managed database landscape exists because operating databases at scale requires significant expertise - handling backups, replication, failover, patching, and performance tuning. AWS managed services handle these operational tasks while allowing you to focus on schema design, query optimization, and application logic.

Core Principles

Match database type to access patterns - Relational for ACID transactions and complex queries, NoSQL for high-throughput key-value access, cache for frequently accessed data
Design for high availability - Use Multi-AZ deployments, read replicas, and automated failover to ensure uptime
Plan capacity proactively - Monitor performance metrics and scale before hitting resource limits
Implement connection pooling - Database connections are expensive; reuse connections to maximize throughput
Optimize costs - Use reserved instances, right-size instances, leverage Aurora Serverless for variable workloads

Database Service Selection

Selecting the appropriate database service begins with understanding your application's data model, consistency requirements, and scaling characteristics. AWS provides different database types optimized for specific use cases.

When to use each service:

RDS (Relational Database Service) - Managed relational databases for traditional workloads:

Applications requiring ACID transactions and complex joins
Existing applications using PostgreSQL, MySQL, MariaDB, Oracle, or SQL Server
Workloads requiring specific database engine features
Lift-and-shift migrations from on-premises databases
Development and test environments

Aurora - Cloud-native relational database built for AWS:

Applications requiring higher performance than standard RDS (up to 5x MySQL, 3x PostgreSQL)
Global applications needing low-latency reads in multiple regions (Aurora Global Database)
Variable workloads benefiting from automatic scaling (Aurora Serverless v2)
Mission-critical applications requiring sub-minute failover
Applications needing advanced features (backtrack, fast cloning, parallel query)

DynamoDB - Fully managed NoSQL database for high-scale applications:

Applications requiring single-digit millisecond latency at any scale
Workloads with simple key-value or document access patterns
Gaming leaderboards, session storage, user profiles
IoT applications with millions of writes per second
Mobile and web applications with unpredictable traffic patterns

ElastiCache - Managed in-memory caching for performance optimization:

Caching database query results and API responses
Session storage for stateless applications
Real-time analytics and leaderboards
Message queuing and pub/sub patterns (Redis)
Rate limiting and distributed locking (Redis)

Database Decision Matrix

Requirement	RDS	Aurora	DynamoDB	ElastiCache
Data Model	Relational (tables)	Relational (tables)	Key-value, document	Key-value, data structures
Consistency	ACID, strong	ACID, strong	Eventually consistent (default)	Eventually consistent
Latency	5-20ms	2-10ms	1-5ms	<1ms
Max Throughput	50k-100k IOPS	500k+ reads, 100k+ writes	Millions IOPS	Millions ops/sec
Max Size	64 TiB	128 TiB	Unlimited	Up to 6.1 TiB per node
Scalability	Vertical (instance size)	Auto-scaling storage, serverless compute	Automatic horizontal	Horizontal (add nodes)
Availability	99.95% (Multi-AZ)	99.99%	99.99% (global tables)	99.9% (cluster mode)
Backups	Automated, snapshots	Automated, continuous	Point-in-time recovery	Redis: manual snapshots
Cost	$0.017-$13.52/hr	$0.029-$14.59/hr	$0.25/GB-month + requests	$0.034-$6.38/hr

Key architectural differences:

RDS vs Aurora: Aurora provides better performance, faster failover (30s vs 60-120s), auto-scaling storage, and global database capabilities. However, Aurora costs 20-30% more than RDS. Use RDS for cost-sensitive workloads where standard performance suffices; use Aurora for performance-critical production applications.

Relational vs DynamoDB: Relational databases excel at complex queries with joins, aggregations, and transactions across multiple tables. DynamoDB excels at simple key-value lookups at massive scale. If your access pattern is "get item by ID" or "query items by partition key," DynamoDB is often the better choice. If you need "find all customers who purchased product X in region Y with discount > 10%," a relational database is more appropriate.

Database vs Cache: Databases provide durability, complex queries, and strong consistency. Caches provide extreme performance for frequently accessed data. Use ElastiCache to reduce database load by caching read-heavy queries, not as a primary data store (with the exception of Redis with persistence for specific use cases).

For database schema design principles applicable to RDS and Aurora, see Database Design.

RDS (Relational Database Service)

RDS provides managed relational databases, handling backups, patching, monitoring, and failover while you focus on schema design and application development. RDS supports PostgreSQL, MySQL, MariaDB, Oracle, and SQL Server, making it suitable for lift-and-shift migrations and applications requiring specific database features.

RDS abstracts infrastructure management - no SSH access to instances, no manual patching, no managing replication. AWS handles operational tasks through automated maintenance windows, but you retain control over database configuration through parameter groups and option groups.

Multi-AZ Deployments

Multi-AZ deployments provide high availability by maintaining a synchronous standby replica in a different Availability Zone. If the primary instance fails, RDS automatically fails over to the standby within 60-120 seconds without data loss.

How Multi-AZ failover works:

RDS detects primary instance failure (health checks, database unresponsive)
DNS record for the database endpoint automatically updates to point to the standby
Standby is promoted to primary and begins accepting connections
Applications automatically reconnect to the new primary (same endpoint)
Total downtime: typically 60-120 seconds

Why Multi-AZ is critical: Single-AZ databases can experience hours of downtime during hardware failures, AZ outages, or OS patching. Multi-AZ reduces downtime to minutes and ensures zero data loss because replication is synchronous (writes are not acknowledged until both primary and standby have received them).

Multi-AZ doubles storage and instance costs (you pay for the standby), but this is a small price compared to the cost of production outages. All production databases should use Multi-AZ.

// Spring Boot DataSource configuration for RDS Multi-AZ
@Configuration
public class RDSDataSourceConfig {

    @Value("${rds.endpoint}")
    private String endpoint;

    @Value("${rds.port:5432}")
    private int port;

    @Value("${rds.database}")
    private String database;

    @Value("${rds.username}")
    private String username;

    @Value("${rds.password}")
    private String password;

    @Bean
    public DataSource dataSource() {
        HikariConfig config = new HikariConfig();

        // RDS endpoint remains stable during Multi-AZ failover
        config.setJdbcUrl(String.format("jdbc:postgresql://%s:%d/%s",
            endpoint, port, database));
        config.setUsername(username);
        config.setPassword(password);

        // Connection pool configuration
        config.setMaximumPoolSize(20);
        config.setMinimumIdle(5);
        config.setConnectionTimeout(30000);  // 30 seconds
        config.setIdleTimeout(600000);       // 10 minutes
        config.setMaxLifetime(1800000);      // 30 minutes

        // Enable automatic reconnection during failover
        config.addDataSourceProperty("socketTimeout", "30");
        config.addDataSourceProperty("connectTimeout", "10");
        config.addDataSourceProperty("loginTimeout", "10");

        // Connection test query
        config.setConnectionTestQuery("SELECT 1");

        return new HikariDataSource(config);
    }
}

Key configuration notes:

Use the RDS endpoint hostname (e.g., mydb.abc123.us-east-1.rds.amazonaws.com), never IP addresses
Connection pools automatically reconnect after failover completes
Set appropriate timeouts to detect failures quickly
Test query (SELECT 1) validates connections before use

For comprehensive connection pooling patterns and transaction management, see Spring Boot Data Access and Database ORM.

Read Replicas

Read replicas provide horizontal scaling for read-heavy workloads by creating asynchronous copies of your database. Applications can distribute read queries across replicas while writes go to the primary instance.

Use cases for read replicas:

Reporting and analytics queries that don't need real-time data
Scaling read capacity beyond a single database instance
Geographic distribution (place replicas in regions close to users)
Development and testing against production-like data

Read replica characteristics:

Asynchronous replication: Replicas lag behind primary by seconds to minutes
Eventually consistent: Reads may return slightly stale data
Read-only: Applications cannot write to replicas
Up to 15 replicas per RDS instance (Aurora supports more)
Cross-region: Replicas can be in different regions for geographic distribution

// Spring Boot configuration with read replica routing
@Configuration
public class ReadReplicaConfig {

    @Bean
    @Primary
    public DataSource routingDataSource(
            @Qualifier("primaryDataSource") DataSource primary,
            @Qualifier("replicaDataSource") DataSource replica) {

        RoutingDataSource routingDataSource = new RoutingDataSource();

        Map<Object, Object> targetDataSources = new HashMap<>();
        targetDataSources.put(DatabaseType.PRIMARY, primary);
        targetDataSources.put(DatabaseType.REPLICA, replica);

        routingDataSource.setTargetDataSources(targetDataSources);
        routingDataSource.setDefaultTargetDataSource(primary);

        return routingDataSource;
    }

    @Bean
    @ConfigurationProperties("rds.primary")
    public DataSourceProperties primaryProperties() {
        return new DataSourceProperties();
    }

    @Bean
    @ConfigurationProperties("rds.replica")
    public DataSourceProperties replicaProperties() {
        return new DataSourceProperties();
    }

    @Bean
    public DataSource primaryDataSource() {
        return primaryProperties()
            .initializeDataSourceBuilder()
            .type(HikariDataSource.class)
            .build();
    }

    @Bean
    public DataSource replicaDataSource() {
        return replicaProperties()
            .initializeDataSourceBuilder()
            .type(HikariDataSource.class)
            .build();
    }
}

// Custom annotation to route queries to replicas
@Target({ElementType.METHOD, ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface ReadReplica {
}

// Aspect to intercept @ReadReplica methods
@Aspect
@Component
public class ReadReplicaRoutingAspect {

    @Around("@annotation(readReplica)")
    public Object routeToReplica(ProceedingJoinPoint joinPoint, ReadReplica readReplica)
            throws Throwable {
        DatabaseContextHolder.set(DatabaseType.REPLICA);
        try {
            return joinPoint.proceed();
        } finally {
            DatabaseContextHolder.clear();
        }
    }
}

// Service using read replica for queries
@Service
@Transactional(readOnly = true)
public class ReportingService {

    private final OrderRepository orderRepository;

    @ReadReplica
    public List<OrderSummary> generateDailySalesReport(LocalDate date) {
        // This query runs against the read replica
        return orderRepository.findDailySales(date);
    }
}

Replica lag monitoring is critical - applications must handle the eventual consistency model:

@Service
public class ReplicaMonitoringService {

    private final CloudWatchClient cloudWatchClient;
    private final MeterRegistry meterRegistry;

    @Scheduled(fixedRate = 60000)  // Every minute
    public void checkReplicaLag() {
        GetMetricStatisticsResponse response = cloudWatchClient.getMetricStatistics(
            GetMetricStatisticsRequest.builder()
                .namespace("AWS/RDS")
                .metricName("ReplicaLag")
                .dimensions(Dimension.builder()
                    .name("DBInstanceIdentifier")
                    .value("mydb-replica")
                    .build())
                .startTime(Instant.now().minus(5, ChronoUnit.MINUTES))
                .endTime(Instant.now())
                .period(300)
                .statistics(Statistic.AVERAGE, Statistic.MAXIMUM)
                .build()
        );

        Double avgLag = response.datapoints().stream()
            .mapToDouble(Datapoint::average)
            .average()
            .orElse(0);

        Double maxLag = response.datapoints().stream()
            .mapToDouble(Datapoint::maximum)
            .max()
            .orElse(0);

        meterRegistry.gauge("rds.replica.lag.average", avgLag);
        meterRegistry.gauge("rds.replica.lag.max", maxLag);

        // Alert if replica lag exceeds 30 seconds
        if (maxLag > 30) {
            log.warn("Read replica lag is high: {} seconds", maxLag);
        }
    }
}

When NOT to use read replicas: If your application requires strong consistency (every read must reflect the most recent write), read replicas are not appropriate. Use Aurora with reader endpoints or scale vertically with a larger instance.

RDS Proxy

RDS Proxy is a fully managed database proxy that pools and shares connections, reducing the load on your database and improving application scalability. This is particularly valuable for serverless applications (Lambda) that create many short-lived connections.

Why RDS Proxy matters:

Problem: Databases have maximum connection limits (PostgreSQL defaults to 100, MySQL to 151). Each connection consumes memory on the database server. Applications that create many connections (especially Lambda functions that can scale to thousands of concurrent invocations) quickly exhaust connection limits, causing connection failures.

Solution: RDS Proxy maintains a connection pool and multiplexes application connections onto a smaller number of database connections. This allows thousands of Lambda functions to share 100 database connections efficiently.

Benefits:

Improved database efficiency: Reduce connection churn and overhead
Automatic failover: 66% faster failover than direct connections
IAM authentication: Eliminate hard-coded credentials in Lambda functions
Connection reuse: Applications connect quickly to warm connections from the pool

// Spring Boot Lambda function using RDS Proxy with IAM auth
@Component
public class OrderProcessorHandler implements RequestHandler<SQSEvent, Void> {

    private final DataSource dataSource;

    public OrderProcessorHandler() {
        HikariConfig config = new HikariConfig();

        // Connect to RDS Proxy endpoint (not directly to RDS)
        config.setJdbcUrl("jdbc:postgresql://mydb-proxy.proxy-abc123.us-east-1.rds.amazonaws.com:5432/orders");

        // Use IAM authentication (no passwords)
        config.setUsername("db_user");
        config.addDataSourceProperty("sslmode", "require");
        config.addDataSourceProperty("sslrootcert", "rds-ca-2019-root.pem");

        // Set AWS RDS IAM authentication plugin
        config.addDataSourceProperty("password", generateIAMAuthToken());

        // Lambda-specific connection pool settings
        config.setMaximumPoolSize(2);  // Small pool per Lambda
        config.setMinimumIdle(0);      // Release connections when idle
        config.setConnectionTimeout(5000);
        config.setIdleTimeout(60000);

        this.dataSource = new HikariDataSource(config);
    }

    private String generateIAMAuthToken() {
        RdsUtilities utilities = RdsUtilities.builder()
            .region(Region.US_EAST_1)
            .build();

        GenerateAuthenticationTokenRequest request = GenerateAuthenticationTokenRequest.builder()
            .hostname("mydb-proxy.proxy-abc123.us-east-1.rds.amazonaws.com")
            .port(5432)
            .username("db_user")
            .build();

        return utilities.generateAuthenticationToken(request);
    }

    @Override
    public Void handleRequest(SQSEvent event, Context context) {
        for (SQSEvent.SQSMessage message : event.getRecords()) {
            processOrder(message.getBody());
        }
        return null;
    }

    private void processOrder(String orderData) {
        try (Connection conn = dataSource.getConnection();
             PreparedStatement stmt = conn.prepareStatement(
                 "INSERT INTO orders (id, customer_id, total, status) VALUES (?, ?, ?, ?)")) {

            // Process order...
            stmt.executeUpdate();
        } catch (SQLException e) {
            throw new RuntimeException("Failed to process order", e);
        }
    }
}

For IAM authentication patterns and security best practices, see AWS IAM and Authentication.

Automated Backups and Point-in-Time Recovery

RDS automatically backs up your database daily during a backup window you specify. Backups are stored in S3 and retained for 1-35 days. Additionally, RDS captures transaction logs every 5 minutes, enabling point-in-time recovery (PITR) to any second within the retention period.

Backup strategy:

Automated daily backups: Full database snapshot during backup window
Transaction logs: Continuous backup of transaction logs
Manual snapshots: User-initiated snapshots retained until explicitly deleted
Cross-region snapshot copies: Disaster recovery in different regions

// Automated snapshot management
@Service
public class RDSBackupService {

    private final RdsClient rdsClient;

    // Create manual snapshot before maintenance
    public String createManualSnapshot(String dbInstanceId, String reason) {
        String snapshotId = String.format("%s-manual-%s-%d",
            dbInstanceId, reason, System.currentTimeMillis());

        CreateDbSnapshotResponse response = rdsClient.createDBSnapshot(
            CreateDbSnapshotRequest.builder()
                .dbInstanceIdentifier(dbInstanceId)
                .dbSnapshotIdentifier(snapshotId)
                .tags(Tag.builder().key("Purpose").value(reason).build())
                .build()
        );

        log.info("Created manual snapshot: {}", response.dbSnapshot().dbSnapshotIdentifier());
        return response.dbSnapshot().dbSnapshotIdentifier();
    }

    // Copy snapshot to another region for disaster recovery
    public String copySnapshotToRegion(String snapshotId, String targetRegion) {
        String targetSnapshotId = snapshotId + "-" + targetRegion;

        CopyDbSnapshotResponse response = rdsClient.copyDBSnapshot(
            CopyDbSnapshotRequest.builder()
                .sourceDBSnapshotIdentifier(
                    String.format("arn:aws:rds:us-east-1:123456789012:snapshot:%s", snapshotId))
                .targetDBSnapshotIdentifier(targetSnapshotId)
                .copyTags(true)
                .kmsKeyId("arn:aws:kms:eu-west-1:123456789012:key/abc-123")  // Encrypt in target region
                .build()
        );

        log.info("Copying snapshot {} to region {}", snapshotId, targetRegion);
        return response.dbSnapshot().dbSnapshotIdentifier();
    }

    // Cleanup old manual snapshots
    @Scheduled(cron = "0 0 3 * * SUN")  // Weekly on Sunday at 3 AM
    public void cleanupOldSnapshots() {
        DescribeDbSnapshotsResponse response = rdsClient.describeDBSnapshots(
            DescribeDbSnapshotsRequest.builder()
                .snapshotType("manual")
                .build()
        );

        Instant cutoff = Instant.now().minus(90, ChronoUnit.DAYS);

        for (DBSnapshot snapshot : response.dbSnapshots()) {
            if (snapshot.snapshotCreateTime().isBefore(cutoff)) {
                rdsClient.deleteDBSnapshot(
                    DeleteDbSnapshotRequest.builder()
                        .dbSnapshotIdentifier(snapshot.dbSnapshotIdentifier())
                        .build()
                );

                log.info("Deleted old snapshot: {} from {}",
                    snapshot.dbSnapshotIdentifier(),
                    snapshot.snapshotCreateTime());
            }
        }
    }
}

Recovery Point Objective (RPO): With continuous transaction log backups, RDS provides RPO of 5 minutes - you can lose at most 5 minutes of data in a disaster. For zero data loss, use Multi-AZ deployments with synchronous replication.

Recovery Time Objective (RTO): Restoring from a snapshot takes 10-30 minutes depending on database size. Point-in-time recovery requires restoring a snapshot and then replaying transaction logs, adding 5-15 minutes.

For comprehensive backup and disaster recovery strategies, see Disaster Recovery.

Performance Monitoring and Optimization

RDS provides Performance Insights, a tool that visualizes database load and helps identify performance bottlenecks. Performance Insights shows which queries consume the most database time, enabling targeted optimization.

// Monitor RDS performance metrics
@Service
public class RDSPerformanceMonitorService {

    private final CloudWatchClient cloudWatchClient;
    private final MeterRegistry meterRegistry;

    @Scheduled(fixedRate = 60000)  // Every minute
    public void collectPerformanceMetrics() {
        String dbInstanceId = "mydb-production";
        Instant endTime = Instant.now();
        Instant startTime = endTime.minus(5, ChronoUnit.MINUTES);

        // CPU utilization
        Double cpuUtilization = getMetric(dbInstanceId, "CPUUtilization", startTime, endTime);
        meterRegistry.gauge("rds.cpu.utilization", cpuUtilization);

        // Database connections
        Double dbConnections = getMetric(dbInstanceId, "DatabaseConnections", startTime, endTime);
        meterRegistry.gauge("rds.connections", dbConnections);

        // Read/Write IOPS
        Double readIOPS = getMetric(dbInstanceId, "ReadIOPS", startTime, endTime);
        Double writeIOPS = getMetric(dbInstanceId, "WriteIOPS", startTime, endTime);
        meterRegistry.gauge("rds.read.iops", readIOPS);
        meterRegistry.gauge("rds.write.iops", writeIOPS);

        // Freeable memory
        Double freeableMemory = getMetric(dbInstanceId, "FreeableMemory", startTime, endTime);
        meterRegistry.gauge("rds.memory.freeable", freeableMemory);

        // Alert on high CPU
        if (cpuUtilization > 80) {
            log.warn("RDS CPU utilization is high: {}%", cpuUtilization);
        }

        // Alert on connection saturation
        int maxConnections = getMaxConnections(dbInstanceId);
        if (dbConnections > maxConnections * 0.8) {
            log.warn("RDS connections at {}% of maximum ({}/{})",
                (dbConnections / maxConnections) * 100, dbConnections, maxConnections);
        }
    }

    private Double getMetric(String dbInstanceId, String metricName,
                            Instant startTime, Instant endTime) {
        GetMetricStatisticsResponse response = cloudWatchClient.getMetricStatistics(
            GetMetricStatisticsRequest.builder()
                .namespace("AWS/RDS")
                .metricName(metricName)
                .dimensions(Dimension.builder()
                    .name("DBInstanceIdentifier")
                    .value(dbInstanceId)
                    .build())
                .startTime(startTime)
                .endTime(endTime)
                .period(300)
                .statistics(Statistic.AVERAGE)
                .build()
        );

        return response.datapoints().stream()
            .mapToDouble(Datapoint::average)
            .average()
            .orElse(0.0);
    }
}

Key RDS performance metrics:

CPUUtilization: Should stay below 80% under normal load
DatabaseConnections: Monitor for connection exhaustion
ReadIOPS / WriteIOPS: Compare against provisioned IOPS limits
FreeableMemory: Low memory causes disk swapping and performance degradation
ReplicaLag: Monitor read replica lag for consistency requirements

For comprehensive observability patterns, see Observability and AWS Observability.

Aurora

Aurora is AWS's cloud-native relational database built from the ground up for the cloud. Aurora provides up to 5x the throughput of MySQL and 3x the throughput of PostgreSQL while maintaining compatibility with these engines.

Aurora separates compute and storage - the database engine runs on EC2-like compute instances while storage is a distributed, self-healing layer that automatically scales up to 128 TiB. This architecture enables features impossible with traditional databases: fast cloning, backtrack, and automatic storage scaling.

Why choose Aurora over RDS:

Performance: 5x MySQL, 3x PostgreSQL throughput
Availability: 99.99% SLA with Multi-AZ, faster failover (30s typical)
Scalability: Storage auto-scales in 10 GiB increments, up to 15 read replicas
Advanced features: Backtrack, fast cloning, Global Database, parallel query
Cost at scale: Despite 20-30% higher hourly costs, better performance often reduces total cost

Aurora architecture:

Aurora stores 6 copies of data across 3 Availability Zones (2 copies per AZ). Writes require acknowledgment from 4 of 6 copies; reads can be served from any copy. This provides fault tolerance: Aurora can lose 2 copies without affecting write availability and 3 copies without affecting read availability.

Aurora Serverless v2

Aurora Serverless v2 automatically scales database capacity based on application load, eliminating the need to provision specific instance sizes. This is ideal for variable, intermittent, or unpredictable workloads.

How Aurora Serverless v2 works:

Define minimum and maximum capacity (ACUs - Aurora Capacity Units)
Aurora monitors database load (CPU, connections, queries)
Scales up when approaching capacity limits (seconds)
Scales down during idle periods
You pay per second for consumed capacity

// Spring Boot with Aurora Serverless - no configuration changes needed
@Configuration
public class AuroraServerlessConfig {

    @Bean
    public DataSource dataSource(
            @Value("${aurora.endpoint}") String endpoint,
            @Value("${aurora.database}") String database,
            @Value("${aurora.username}") String username,
            @Value("${aurora.password}") String password) {

        HikariConfig config = new HikariConfig();

        // Aurora Serverless endpoint (same as provisioned Aurora)
        config.setJdbcUrl(String.format("jdbc:postgresql://%s:5432/%s", endpoint, database));
        config.setUsername(username);
        config.setPassword(password);

        // Connection pool sized for minimum ACUs
        // Aurora Serverless scales capacity, not connection limits
        config.setMaximumPoolSize(20);
        config.setMinimumIdle(2);

        // Longer timeouts for scaling events
        config.setConnectionTimeout(10000);  // 10 seconds
        config.setIdleTimeout(300000);       // 5 minutes

        return new HikariDataSource(config);
    }
}

Use cases for Aurora Serverless:

Development and test databases (scale to zero when not in use)
Infrequently used applications (scale down to minimum during idle periods)
Variable workloads (scale up during business hours, down at night)
New applications with unknown traffic patterns

Cost comparison: Aurora Serverless v2 costs $0.12 per ACU-hour. A 1 ACU minimum configuration costs $2.88/day if running 24/7, but scales to zero during idle periods (you pay only for storage). A db.t4g.medium provisioned instance costs $1.46/day regardless of utilization.

Aurora Global Database

Aurora Global Database enables a single database to span multiple AWS regions with sub-second replication latency. This provides disaster recovery and low-latency reads for globally distributed applications.

Global Database characteristics:

Primary region: One region hosts the writer instance
Secondary regions: Up to 5 secondary regions with read replicas
Replication lag: Typically less than 1 second
Failover: Promote secondary region to primary in under 1 minute (RTO < 1 minute)
Read performance: Local reads in secondary regions (low latency)

Use cases for Global Database:

Global applications requiring low-latency reads in multiple regions
Disaster recovery with fast RTO (< 1 minute) across regions
Geographic data distribution for compliance (keep writes in specific regions)

Aurora Advanced Features

Backtrack: Rewind database to a previous point in time without restoring from backup. Useful for recovering from application errors (accidental DELETE, bad schema migration). Backtrack is instant - no restore delay.

Fast Cloning: Create a copy of an Aurora database in minutes without copying data. Clones use copy-on-write, sharing storage with the source database until data diverges. Perfect for creating test environments from production data.

Parallel Query: Offload analytical queries to Aurora storage layer, enabling faster execution for queries scanning millions of rows. Parallel query is transparent - no application changes required.

// Using Aurora advanced features
@Service
public class AuroraManagementService {

    private final RdsClient rdsClient;

    // Create fast clone for testing
    public String createTestClone(String sourceClusterId) {
        String cloneId = sourceClusterId + "-test-" + System.currentTimeMillis();

        RestoreDbClusterToPointInTimeResponse response = rdsClient.restoreDBClusterToPointInTime(
            RestoreDbClusterToPointInTimeRequest.builder()
                .sourceDBClusterIdentifier(sourceClusterId)
                .dbClusterIdentifier(cloneId)
                .restoreType("copy-on-write")  // Fast clone
                .useLatestRestorableTime(true)
                .build()
        );

        log.info("Created fast clone: {} from {}", cloneId, sourceClusterId);
        return response.dbCluster().dbClusterIdentifier();
    }

    // Backtrack to recover from error
    public void backtrackToTimestamp(String clusterId, Instant targetTime) {
        BacktrackDbClusterResponse response = rdsClient.backtrackDBCluster(
            BacktrackDbClusterRequest.builder()
                .dbClusterIdentifier(clusterId)
                .backtrackTo(targetTime)
                .build()
        );

        log.info("Backtracked cluster {} to {}", clusterId, targetTime);
    }
}

For comprehensive database design patterns applicable to Aurora, see Database Design and Database Migrations.

DynamoDB

DynamoDB is a fully managed NoSQL database providing single-digit millisecond performance at any scale. Unlike relational databases, DynamoDB is schemaless (except for keys), scales horizontally without limits, and optimizes for simple key-value and document access patterns.

DynamoDB's architecture is fundamentally different from relational databases. Data is distributed across partitions based on partition keys, enabling massive parallelism. However, this means complex queries (joins, aggregations across items) are not supported - you must design your data model around your access patterns.

Data Modeling Principles

DynamoDB data modeling is reverse from relational design: you design your schema based on how you'll query the data, not based on entity relationships. Poor partition key design leads to performance bottlenecks; good partition key design enables unlimited scale.

Key concepts:

Partition Key (PK): Required for every item. DynamoDB distributes items across partitions based on a hash of the partition key. Items with the same partition key are stored together and can be queried efficiently.

Sort Key (SK): Optional. Within a partition, items are sorted by sort key. This enables range queries and hierarchical data modeling.

Composite Keys: PK + SK together form the item's primary key. This enables patterns like "all orders for customer X" or "all products in category Y sorted by price."

Good partition keys have:

High cardinality: Many unique values (user IDs, order IDs)
Even distribution: No hot partitions (don't use "country" if 80% of users are in one country)
Predictable access: You know the partition key for your queries

Bad partition keys:

Low cardinality: "status" with only 3 values (active, inactive, pending)
Uneven distribution: "date" where current date gets 90% of writes
Unpredictable: Can't efficiently query "find all users who purchased product X"

// DynamoDB table design for e-commerce orders
@DynamoDbBean
public class OrderItem {

    // Partition key: Customer ID for customer-centric access
    private String pk;  // PK: USER#12345

    // Sort key: Hierarchical pattern for different entity types
    private String sk;  // SK: ORDER#2024-01-15#abc123 or ORDER#ITEM#xyz789

    private String entityType;  // "ORDER" or "ORDER_ITEM"
    private String orderId;
    private String orderDate;
    private Double totalAmount;
    private String status;

    // Item attributes
    private String productId;
    private Integer quantity;
    private Double price;

    // GSI keys for alternate access patterns
    private String gsi1pk;  // ORDER#abc123 (query by order ID)
    private String gsi1sk;  // ITEM#xyz789

    @DynamoDbPartitionKey
    @DynamoDbAttribute("PK")
    public String getPk() {
        return pk;
    }

    @DynamoDbSortKey
    @DynamoDbAttribute("SK")
    public String getSk() {
        return sk;
    }

    // Additional getters and setters...
}

// Service for DynamoDB operations
@Service
public class OrderDynamoDBService {

    private final DynamoDbClient dynamoDbClient;
    private final DynamoDbEnhancedClient enhancedClient;

    private static final String TABLE_NAME = "Orders";

    // Query all orders for a customer
    public List<OrderItem> getCustomerOrders(String customerId) {
        DynamoDbTable<OrderItem> table = enhancedClient.table(TABLE_NAME,
            TableSchema.fromBean(OrderItem.class));

        QueryConditional condition = QueryConditional
            .keyEqualTo(Key.builder()
                .partitionValue("USER#" + customerId)
                .sortValue("ORDER#")
                .build())
            .and(QueryConditional.sortBeginsWith(
                Key.builder().sortValue("ORDER#").build()));

        return table.query(condition).items().stream().toList();
    }

    // Get specific order with items
    public Order getOrderWithItems(String customerId, String orderId) {
        DynamoDbTable<OrderItem> table = enhancedClient.table(TABLE_NAME,
            TableSchema.fromBean(OrderItem.class));

        // Query returns order header and all order items in one request
        QueryConditional condition = QueryConditional
            .keyEqualTo(Key.builder()
                .partitionValue("USER#" + customerId)
                .sortValue("ORDER#" + orderId)
                .build());

        List<OrderItem> results = table.query(condition).items().stream().toList();

        // First item is order header, rest are order items
        OrderItem orderHeader = results.get(0);
        List<OrderItem> items = results.subList(1, results.size());

        return new Order(orderHeader, items);
    }

    // Write order with items in transaction
    public void createOrder(String customerId, Order order) {
        List<TransactWriteItem> actions = new ArrayList<>();

        // Order header
        actions.add(TransactWriteItem.builder()
            .put(Put.builder()
                .tableName(TABLE_NAME)
                .item(Map.of(
                    "PK", AttributeValue.builder().s("USER#" + customerId).build(),
                    "SK", AttributeValue.builder().s("ORDER#" + order.getOrderId()).build(),
                    "EntityType", AttributeValue.builder().s("ORDER").build(),
                    "TotalAmount", AttributeValue.builder().n(order.getTotal().toString()).build(),
                    "Status", AttributeValue.builder().s(order.getStatus()).build()
                ))
                .build())
            .build());

        // Order items
        for (OrderLineItem item : order.getItems()) {
            actions.add(TransactWriteItem.builder()
                .put(Put.builder()
                    .tableName(TABLE_NAME)
                    .item(Map.of(
                        "PK", AttributeValue.builder().s("USER#" + customerId).build(),
                        "SK", AttributeValue.builder().s("ORDER#" + order.getOrderId() + "#ITEM#" + item.getId()).build(),
                        "EntityType", AttributeValue.builder().s("ORDER_ITEM").build(),
                        "ProductId", AttributeValue.builder().s(item.getProductId()).build(),
                        "Quantity", AttributeValue.builder().n(String.valueOf(item.getQuantity())).build(),
                        "Price", AttributeValue.builder().n(item.getPrice().toString()).build()
                    ))
                    .build())
                .build());
        }

        // Execute transaction (all-or-nothing)
        dynamoDbClient.transactWriteItems(TransactWriteItemsRequest.builder()
            .transactItems(actions)
            .build());
    }
}

Single-table design: Advanced DynamoDB pattern where multiple entity types share one table. This enables retrieving related entities in a single query (e.g., order + items) and reduces costs compared to multiple tables. However, it requires careful key design and is not always appropriate.

Global Secondary Indexes (GSI)

GSIs provide alternate query patterns by defining different partition and sort keys. Without GSIs, you can only query by the table's primary key. With GSIs, you can query by alternate attributes.

GSI characteristics:

Eventually consistent: GSIs lag behind the main table by milliseconds
Projected attributes: Choose which attributes to include (all, keys-only, or specific)
Separate capacity: GSIs have their own read/write capacity provisioned separately
Sparse indexes: Only items with the GSI keys are included

// Querying by GSI for alternate access pattern
public List<OrderItem> getOrderById(String orderId) {
    // GSI allows querying by order ID instead of customer ID

    DynamoDbTable<OrderItem> table = enhancedClient.table(TABLE_NAME,
        TableSchema.fromBean(OrderItem.class));

    // Query GSI1 where GSI1PK = ORDER#abc123
    QueryConditional condition = QueryConditional
        .keyEqualTo(Key.builder()
            .partitionValue("ORDER#" + orderId)
            .build());

    return table.index("GSI1")
        .query(condition)
        .items()
        .stream()
        .toList();
}

When to use GSIs:

You need to query by attributes other than the primary key
You need multiple access patterns on the same data
Your query patterns don't match your partition key design

GSI limitations:

Maximum 20 GSIs per table
GSIs increase storage costs (each GSI stores copies of projected attributes)
GSIs consume write capacity for every write to the main table

DynamoDB Capacity Modes

DynamoDB offers two capacity modes with different billing and scaling characteristics:

On-Demand: Pay per request with automatic scaling:

No capacity planning required
Instantly accommodates traffic spikes
$1.25 per million writes, $0.25 per million reads (can be expensive at high volume)
Ideal for unpredictable workloads, new applications, development/testing

Provisioned: Pre-provision read and write capacity units:

$0.00065 per write capacity unit-hour, $0.00013 per read capacity unit-hour (much cheaper at scale)
Auto-scaling adjusts capacity based on utilization
Reserved capacity available for further discounts
Ideal for predictable workloads with sustained traffic

// Monitor DynamoDB capacity utilization
@Service
public class DynamoDBCapacityMonitorService {

    private final CloudWatchClient cloudWatchClient;
    private final MeterRegistry meterRegistry;

    @Scheduled(fixedRate = 60000)  // Every minute
    public void monitorCapacity() {
        String tableName = "Orders";
        Instant endTime = Instant.now();
        Instant startTime = endTime.minus(5, ChronoUnit.MINUTES);

        // Consumed read capacity
        Double readCapacity = getMetric(tableName, "ConsumedReadCapacityUnits", startTime, endTime);
        meterRegistry.gauge("dynamodb.read.consumed", readCapacity);

        // Consumed write capacity
        Double writeCapacity = getMetric(tableName, "ConsumedWriteCapacityUnits", startTime, endTime);
        meterRegistry.gauge("dynamodb.write.consumed", writeCapacity);

        // Throttled requests
        Double throttledReads = getMetric(tableName, "ReadThrottleEvents", startTime, endTime);
        Double throttledWrites = getMetric(tableName, "WriteThrottleEvents", startTime, endTime);

        if (throttledReads > 0 || throttledWrites > 0) {
            log.warn("DynamoDB throttling detected: {} read throttles, {} write throttles",
                throttledReads, throttledWrites);
        }
    }
}

Capacity mode selection: Use On-Demand for unpredictable traffic or if you're unsure of capacity needs. Switch to Provisioned once traffic patterns stabilize - at steady-state, Provisioned is 5-10x cheaper than On-Demand.

DynamoDB Streams and Global Tables

DynamoDB Streams capture change data (inserts, updates, deletes) for real-time processing. Streams enable event-driven architectures, data replication, and audit logging.

// Lambda function processing DynamoDB Stream
@Component
public class OrderStreamHandler implements RequestHandler<DynamodbEvent, Void> {

    private final OrderNotificationService notificationService;

    @Override
    public Void handleRequest(DynamodbEvent event, Context context) {
        for (DynamodbEvent.DynamodbStreamRecord record : event.getRecords()) {
            if (record.getEventName().equals("INSERT")) {
                Map<String, AttributeValue> newImage = record.getDynamodb().getNewImage();

                String entityType = newImage.get("EntityType").getS();
                if (entityType.equals("ORDER")) {
                    String customerId = newImage.get("PK").getS().replace("USER#", "");
                    String orderId = newImage.get("SK").getS().replace("ORDER#", "");

                    // Send order confirmation
                    notificationService.sendOrderConfirmation(customerId, orderId);
                }
            }
        }
        return null;
    }
}

Global Tables provide multi-region, multi-active replication for disaster recovery and low-latency global access. Writes in any region replicate to all other regions, typically within one second.

For event-driven architectures with DynamoDB Streams, see Event-Driven Architecture and AWS Messaging.

ElastiCache

ElastiCache provides managed in-memory caching with Redis or Memcached, delivering sub-millisecond latency for frequently accessed data. Caching reduces database load, improves response times, and enables applications to scale beyond database capacity limits.

Redis vs Memcached

Redis - Advanced data structures with persistence:

Data structures: Strings, hashes, lists, sets, sorted sets, bitmaps, HyperLogLog, geospatial indexes
Persistence: Optional disk persistence (RDB snapshots, AOF logs)
Replication: Primary-replica replication with automatic failover
Pub/Sub: Message broker capabilities
Atomic operations: Complex atomic operations (increment counters, set operations)
Use cases: Session storage, leaderboards, real-time analytics, distributed locks, rate limiting

Memcached - Simple, high-performance key-value cache:

Data structures: Key-value pairs only (strings)
No persistence: Cache is ephemeral; data lost on restart
Multi-threaded: Better CPU utilization on multi-core instances
Simplicity: Simpler mental model, easier to operate
Use cases: Simple caching, database query results, API response caching

Recommendation: Use Redis unless you have a specific reason for Memcached. Redis provides more features while maintaining excellent performance. Memcached's simpler model rarely outweighs Redis's additional capabilities.

ElastiCache Redis Configuration

// Spring Boot Redis configuration with ElastiCache
@Configuration
@EnableCaching
public class RedisCacheConfig {

    @Value("${elasticache.redis.endpoint}")
    private String redisEndpoint;

    @Value("${elasticache.redis.port:6379}")
    private int redisPort;

    @Bean
    public LettuceConnectionFactory redisConnectionFactory() {
        // Cluster mode configuration
        RedisClusterConfiguration clusterConfig = new RedisClusterConfiguration()
            .clusterNode(redisEndpoint, redisPort);

        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .commandTimeout(Duration.ofSeconds(2))
            .shutdownTimeout(Duration.ZERO)
            .build();

        return new LettuceConnectionFactory(clusterConfig, clientConfig);
    }

    @Bean
    public RedisTemplate<String, Object> redisTemplate(
            LettuceConnectionFactory connectionFactory) {

        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(connectionFactory);

        // JSON serialization for values
        Jackson2JsonRedisSerializer<Object> serializer =
            new Jackson2JsonRedisSerializer<>(Object.class);

        ObjectMapper mapper = new ObjectMapper();
        mapper.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
        mapper.activateDefaultTyping(
            mapper.getPolymorphicTypeValidator(),
            ObjectMapper.DefaultTyping.NON_FINAL);
        serializer.setObjectMapper(mapper);

        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(serializer);
        template.setHashKeySerializer(new StringRedisSerializer());
        template.setHashValueSerializer(serializer);

        template.afterPropertiesSet();
        return template;
    }

    @Bean
    public CacheManager cacheManager(LettuceConnectionFactory connectionFactory) {
        RedisCacheConfiguration config = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofHours(1))  // 1 hour TTL
            .disableCachingNullValues()
            .serializeKeysWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new GenericJackson2JsonRedisSerializer()));

        return RedisCacheManager.builder(connectionFactory)
            .cacheDefaults(config)
            .build();
    }
}

// Using Redis for caching
@Service
public class ProductService {

    private final ProductRepository productRepository;
    private final RedisTemplate<String, Object> redisTemplate;

    @Cacheable(value = "products", key = "#productId")
    public Product getProduct(String productId) {
        // Cache miss - fetch from database
        return productRepository.findById(productId)
            .orElseThrow(() -> new ProductNotFoundException(productId));
    }

    @CacheEvict(value = "products", key = "#product.id")
    public Product updateProduct(Product product) {
        // Evict cache before updating
        return productRepository.save(product);
    }

    // Manual cache operations for complex scenarios
    public List<Product> getPopularProducts() {
        String cacheKey = "products:popular";

        // Try cache first
        List<Product> cached = (List<Product>) redisTemplate.opsForValue().get(cacheKey);
        if (cached != null) {
            return cached;
        }

        // Cache miss - compute and cache
        List<Product> products = productRepository.findPopularProducts();
        redisTemplate.opsForValue().set(cacheKey, products, Duration.ofMinutes(15));

        return products;
    }

    // Distributed lock with Redis
    public void updateInventory(String productId, int quantity) {
        String lockKey = "lock:product:" + productId;

        // Acquire distributed lock
        Boolean acquired = redisTemplate.opsForValue()
            .setIfAbsent(lockKey, "locked", Duration.ofSeconds(10));

        if (Boolean.TRUE.equals(acquired)) {
            try {
                // Critical section - update inventory
                Product product = getProduct(productId);
                product.setInventory(product.getInventory() + quantity);
                updateProduct(product);
            } finally {
                // Release lock
                redisTemplate.delete(lockKey);
            }
        } else {
            throw new ConcurrentModificationException("Could not acquire lock for product " + productId);
        }
    }
}

For comprehensive caching strategies and patterns, see Caching.

Redis Cluster Mode

Redis Cluster mode distributes data across multiple shards for horizontal scaling and high availability. Each shard has a primary node and replica nodes for failover.

Cluster mode benefits:

Horizontal scaling: Distribute data across multiple shards (up to 500 nodes)
High availability: Automatic failover within shards
Partitioning: Data automatically distributed using hash slots

Cluster mode trade-offs:

Cannot use multi-key operations across shards
Lua scripts must operate on keys in the same shard
Slightly higher latency due to redirection for misrouted requests

// Monitor ElastiCache Redis performance
@Service
public class RedisCacheMonitorService {

    private final CloudWatchClient cloudWatchClient;
    private final MeterRegistry meterRegistry;

    @Scheduled(fixedRate = 60000)  // Every minute
    public void monitorRedisMetrics() {
        String cacheClusterId = "my-redis-cluster";
        Instant endTime = Instant.now();
        Instant startTime = endTime.minus(5, ChronoUnit.MINUTES);

        // Cache hit rate
        Double cacheHits = getMetric(cacheClusterId, "CacheHits", startTime, endTime);
        Double cacheMisses = getMetric(cacheClusterId, "CacheMisses", startTime, endTime);

        double hitRate = (cacheHits + cacheMisses) > 0
            ? (cacheHits / (cacheHits + cacheMisses)) * 100
            : 0;

        meterRegistry.gauge("redis.cache.hit.rate", hitRate);

        // Memory utilization
        Double memoryUsed = getMetric(cacheClusterId, "DatabaseMemoryUsagePercentage", startTime, endTime);
        meterRegistry.gauge("redis.memory.utilization", memoryUsed);

        // Evictions
        Double evictions = getMetric(cacheClusterId, "Evictions", startTime, endTime);
        meterRegistry.gauge("redis.evictions", evictions);

        // Alert on low hit rate
        if (hitRate < 80) {
            log.warn("Redis cache hit rate is low: {:.1f}%", hitRate);
        }

        // Alert on high evictions
        if (evictions > 1000) {
            log.warn("Redis is evicting items: {} evictions in last 5 minutes", evictions);
        }

        // Alert on memory pressure
        if (memoryUsed > 85) {
            log.warn("Redis memory utilization is high: {:.1f}%", memoryUsed);
        }
    }
}

Database Migration Strategies

Migrating databases to AWS requires careful planning to minimize downtime and ensure data integrity. AWS provides tools and services to simplify migration.

AWS Database Migration Service (DMS)

DMS migrates databases to AWS with minimal downtime, supporting homogeneous migrations (Oracle to RDS Oracle) and heterogeneous migrations (Oracle to Aurora PostgreSQL).

DMS migration phases:

Full load: Copy all existing data from source to target
Change data capture (CDC): Continuously replicate ongoing changes during migration
Cutover: Switch applications to target database after catching up

Zero-downtime migration strategy:

Set up DMS replication from source to target
Full load completes (database fully copied)
CDC replicates ongoing changes (source and target in sync)
Monitor replication lag until it's minimal (<1 second)
Cutover: redirect applications to target database
Monitor for issues; can fail back to source if needed

For schema migration patterns and version control, see Database Migrations.

Schema Conversion Tool (SCT)

For heterogeneous migrations (e.g., Oracle to PostgreSQL), SCT converts schemas, stored procedures, and application code to be compatible with the target database.

SCT process:

Analyze source database schema and code
Generate conversion report showing compatibility and required changes
Automatically convert compatible objects
Flag incompatible objects requiring manual conversion
Generate converted schema for target database

Common conversion challenges:

Proprietary features (Oracle packages, SQL Server T-SQL)
Different data types (Oracle NUMBER vs PostgreSQL NUMERIC)
Stored procedures requiring rewrite
Application SQL that depends on database-specific features

Cost Optimization

Database costs can be significant. Implementing cost optimization strategies reduces expenses without sacrificing performance or availability.

RDS/Aurora cost optimization:

Reserved Instances: 1-year or 3-year commitments for 30-60% discounts
Right-size instances: Monitor CPU/memory usage and downsize overprovisioned instances
Delete unused snapshots: Snapshots accumulate; implement retention policies
Aurora Serverless: Variable workloads benefit from auto-scaling capacity
Stop non-production databases: Stop dev/test databases outside business hours

DynamoDB cost optimization:

On-Demand to Provisioned: Switch to provisioned capacity once traffic stabilizes (5-10x cheaper)
Auto-scaling: Automatically adjust provisioned capacity based on utilization
Delete unused GSIs: Each GSI increases storage and write costs
Enable TTL: Automatically delete expired items to reduce storage costs
Compress large items: DynamoDB charges per KB; compress data before storing

ElastiCache cost optimization:

Reserved nodes: 1-year or 3-year commitments for 30-50% discounts
Right-size nodes: Start with smaller nodes and scale up based on metrics
Disable snapshots for ephemeral data: Snapshots cost money; disable for truly ephemeral caches
Use Memcached for simple caching: Memcached is cheaper than Redis when advanced features aren't needed

// Automate RDS instance rightsizing recommendations
@Service
public class RDSCostOptimizationService {

    private final CloudWatchClient cloudWatchClient;
    private final RdsClient rdsClient;

    public List<RightsizingRecommendation> analyzeRDSUtilization() {
        List<RightsizingRecommendation> recommendations = new ArrayList<>();

        // Get all RDS instances
        DescribeDbInstancesResponse instances = rdsClient.describeDBInstances();

        for (DBInstance instance : instances.dbInstances()) {
            String instanceId = instance.dbInstanceIdentifier();
            String currentClass = instance.dbInstanceClass();

            // Check average CPU over last 7 days
            double avgCpu = getAverageCPU(instanceId, Duration.ofDays(7));

            // Recommend downsizing if CPU is consistently low
            if (avgCpu < 20) {
                String recommendedClass = suggestSmallerInstance(currentClass);
                double monthlySavings = calculateSavings(currentClass, recommendedClass);

                recommendations.add(new RightsizingRecommendation(
                    instanceId,
                    currentClass,
                    recommendedClass,
                    avgCpu,
                    monthlySavings,
                    "CPU utilization is consistently low; consider downsizing instance"
                ));
            }
        }

        return recommendations;
    }

    private double getAverageCPU(String instanceId, Duration lookback) {
        Instant endTime = Instant.now();
        Instant startTime = endTime.minus(lookback.getSeconds(), ChronoUnit.SECONDS);

        GetMetricStatisticsResponse response = cloudWatchClient.getMetricStatistics(
            GetMetricStatisticsRequest.builder()
                .namespace("AWS/RDS")
                .metricName("CPUUtilization")
                .dimensions(Dimension.builder()
                    .name("DBInstanceIdentifier")
                    .value(instanceId)
                    .build())
                .startTime(startTime)
                .endTime(endTime)
                .period(86400)  // Daily
                .statistics(Statistic.AVERAGE)
                .build()
        );

        return response.datapoints().stream()
            .mapToDouble(Datapoint::average)
            .average()
            .orElse(0);
    }
}

For comprehensive cost optimization strategies across all AWS services, see AWS Cost Optimization.

Anti-Patterns

Single-AZ RDS in production: Single-AZ databases have no automatic failover. Hardware failures or AZ outages cause hours of downtime. Always use Multi-AZ for production.

Not using connection pooling: Opening a new database connection for every request is slow and exhausts connection limits. Use connection pools (HikariCP) to reuse connections efficiently.

Poor DynamoDB partition key design: Low-cardinality partition keys or uneven distributions create hot partitions, limiting throughput. Design partition keys for high cardinality and even distribution.

Querying entire table without indexes: Full table scans are slow and expensive. Create appropriate indexes (RDS/Aurora) or GSIs (DynamoDB) for query patterns.

Not monitoring database performance: Database issues often manifest gradually. Monitor CPU, connections, IOPS, and query performance to identify problems before they impact users.

Using DynamoDB for complex queries: DynamoDB doesn't support joins or complex aggregations. If your access patterns require these, use a relational database instead.

Over-caching: Caching everything wastes memory and introduces staleness. Cache only frequently accessed, read-heavy data. Writes should update or invalidate caches immediately.

Not testing failover: Multi-AZ and Aurora failover work in theory, but applications must handle connection failures gracefully. Test failover scenarios before production incidents.

Ignoring replication lag: Read replicas and DynamoDB Streams have replication lag. Applications must handle eventual consistency - don't assume writes are immediately visible on replicas.

Manual database operations: Use automation (DMS, SCT, Terraform, CloudFormation) for migrations, backups, and configuration changes. Manual operations are error-prone and don't scale.

Database Design - Schema design, normalization, indexing strategies
Database ORM - ORM patterns, JPA/Hibernate best practices
Database Migrations - Schema evolution, Flyway, Liquibase
Caching - Caching strategies, invalidation patterns
Spring Boot Data Access - Transaction management, connection pooling
AWS Networking - VPC configuration, database subnet groups
AWS IAM - Database authentication with IAM
AWS Observability - Monitoring database performance
AWS Storage - Storage for database backups
Event-Driven Architecture - Using DynamoDB Streams
Disaster Recovery - Backup and recovery strategies

Overview​

Core Principles​

Database Service Selection​

Database Decision Matrix​

RDS (Relational Database Service)​

Multi-AZ Deployments​

Read Replicas​

RDS Proxy​

Automated Backups and Point-in-Time Recovery​

Performance Monitoring and Optimization​

Aurora​

Aurora Serverless v2​

Aurora Global Database​

Aurora Advanced Features​

DynamoDB​

Data Modeling Principles​

Global Secondary Indexes (GSI)​

DynamoDB Capacity Modes​

DynamoDB Streams and Global Tables​

ElastiCache​

Redis vs Memcached​

ElastiCache Redis Configuration​

Redis Cluster Mode​

Database Migration Strategies​

AWS Database Migration Service (DMS)​

Schema Conversion Tool (SCT)​

Cost Optimization​

Anti-Patterns​

Related Guidelines​

Further Reading​