AWS Database Services
Overview
AWS provides a comprehensive portfolio of managed database services designed to eliminate the operational burden of database administration while delivering high performance, availability, and scalability. Choosing the right database service requires understanding your data model, access patterns, consistency requirements, and operational constraints.
This guide covers AWS's primary database offerings: Relational Database Service (RDS) for traditional RDBMS workloads, Aurora for cloud-native relational databases, DynamoDB for NoSQL key-value and document storage, and ElastiCache for in-memory caching. Understanding when to use each service and how to configure them optimally is critical for building scalable, cost-effective data architectures.
The managed database landscape exists because operating databases at scale requires significant expertise - handling backups, replication, failover, patching, and performance tuning. AWS managed services handle these operational tasks while allowing you to focus on schema design, query optimization, and application logic.
Core Principles
- Match database type to access patterns - Relational for ACID transactions and complex queries, NoSQL for high-throughput key-value access, cache for frequently accessed data
- Design for high availability - Use Multi-AZ deployments, read replicas, and automated failover to ensure uptime
- Plan capacity proactively - Monitor performance metrics and scale before hitting resource limits
- Implement connection pooling - Database connections are expensive; reuse connections to maximize throughput
- Optimize costs - Use reserved instances, right-size instances, leverage Aurora Serverless for variable workloads
Database Service Selection
Selecting the appropriate database service begins with understanding your application's data model, consistency requirements, and scaling characteristics. AWS provides different database types optimized for specific use cases.
When to use each service:
RDS (Relational Database Service) - Managed relational databases for traditional workloads:
- Applications requiring ACID transactions and complex joins
- Existing applications using PostgreSQL, MySQL, MariaDB, Oracle, or SQL Server
- Workloads requiring specific database engine features
- Lift-and-shift migrations from on-premises databases
- Development and test environments
Aurora - Cloud-native relational database built for AWS:
- Applications requiring higher performance than standard RDS (up to 5x MySQL, 3x PostgreSQL)
- Global applications needing low-latency reads in multiple regions (Aurora Global Database)
- Variable workloads benefiting from automatic scaling (Aurora Serverless v2)
- Mission-critical applications requiring sub-minute failover
- Applications needing advanced features (backtrack, fast cloning, parallel query)
DynamoDB - Fully managed NoSQL database for high-scale applications:
- Applications requiring single-digit millisecond latency at any scale
- Workloads with simple key-value or document access patterns
- Gaming leaderboards, session storage, user profiles
- IoT applications with millions of writes per second
- Mobile and web applications with unpredictable traffic patterns
ElastiCache - Managed in-memory caching for performance optimization:
- Caching database query results and API responses
- Session storage for stateless applications
- Real-time analytics and leaderboards
- Message queuing and pub/sub patterns (Redis)
- Rate limiting and distributed locking (Redis)
Database Decision Matrix
| Requirement | RDS | Aurora | DynamoDB | ElastiCache |
|---|---|---|---|---|
| Data Model | Relational (tables) | Relational (tables) | Key-value, document | Key-value, data structures |
| Consistency | ACID, strong | ACID, strong | Eventually consistent (default) | Eventually consistent |
| Latency | 5-20ms | 2-10ms | 1-5ms | <1ms |
| Max Throughput | 50k-100k IOPS | 500k+ reads, 100k+ writes | Millions IOPS | Millions ops/sec |
| Max Size | 64 TiB | 128 TiB | Unlimited | Up to 6.1 TiB per node |
| Scalability | Vertical (instance size) | Auto-scaling storage, serverless compute | Automatic horizontal | Horizontal (add nodes) |
| Availability | 99.95% (Multi-AZ) | 99.99% | 99.99% (global tables) | 99.9% (cluster mode) |
| Backups | Automated, snapshots | Automated, continuous | Point-in-time recovery | Redis: manual snapshots |
| Cost | $0.017-$13.52/hr | $0.029-$14.59/hr | $0.25/GB-month + requests | $0.034-$6.38/hr |
Key architectural differences:
RDS vs Aurora: Aurora provides better performance, faster failover (30s vs 60-120s), auto-scaling storage, and global database capabilities. However, Aurora costs 20-30% more than RDS. Use RDS for cost-sensitive workloads where standard performance suffices; use Aurora for performance-critical production applications.
Relational vs DynamoDB: Relational databases excel at complex queries with joins, aggregations, and transactions across multiple tables. DynamoDB excels at simple key-value lookups at massive scale. If your access pattern is "get item by ID" or "query items by partition key," DynamoDB is often the better choice. If you need "find all customers who purchased product X in region Y with discount > 10%," a relational database is more appropriate.
Database vs Cache: Databases provide durability, complex queries, and strong consistency. Caches provide extreme performance for frequently accessed data. Use ElastiCache to reduce database load by caching read-heavy queries, not as a primary data store (with the exception of Redis with persistence for specific use cases).
For database schema design principles applicable to RDS and Aurora, see Database Design.
RDS (Relational Database Service)
RDS provides managed relational databases, handling backups, patching, monitoring, and failover while you focus on schema design and application development. RDS supports PostgreSQL, MySQL, MariaDB, Oracle, and SQL Server, making it suitable for lift-and-shift migrations and applications requiring specific database features.
RDS abstracts infrastructure management - no SSH access to instances, no manual patching, no managing replication. AWS handles operational tasks through automated maintenance windows, but you retain control over database configuration through parameter groups and option groups.
Multi-AZ Deployments
Multi-AZ deployments provide high availability by maintaining a synchronous standby replica in a different Availability Zone. If the primary instance fails, RDS automatically fails over to the standby within 60-120 seconds without data loss.
How Multi-AZ failover works:
- RDS detects primary instance failure (health checks, database unresponsive)
- DNS record for the database endpoint automatically updates to point to the standby
- Standby is promoted to primary and begins accepting connections
- Applications automatically reconnect to the new primary (same endpoint)
- Total downtime: typically 60-120 seconds
Why Multi-AZ is critical: Single-AZ databases can experience hours of downtime during hardware failures, AZ outages, or OS patching. Multi-AZ reduces downtime to minutes and ensures zero data loss because replication is synchronous (writes are not acknowledged until both primary and standby have received them).
Multi-AZ doubles storage and instance costs (you pay for the standby), but this is a small price compared to the cost of production outages. All production databases should use Multi-AZ.
// Spring Boot DataSource configuration for RDS Multi-AZ
@Configuration
public class RDSDataSourceConfig {
@Value("${rds.endpoint}")
private String endpoint;
@Value("${rds.port:5432}")
private int port;
@Value("${rds.database}")
private String database;
@Value("${rds.username}")
private String username;
@Value("${rds.password}")
private String password;
@Bean
public DataSource dataSource() {
HikariConfig config = new HikariConfig();
// RDS endpoint remains stable during Multi-AZ failover
config.setJdbcUrl(String.format("jdbc:postgresql://%s:%d/%s",
endpoint, port, database));
config.setUsername(username);
config.setPassword(password);
// Connection pool configuration
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(30000); // 30 seconds
config.setIdleTimeout(600000); // 10 minutes
config.setMaxLifetime(1800000); // 30 minutes
// Enable automatic reconnection during failover
config.addDataSourceProperty("socketTimeout", "30");
config.addDataSourceProperty("connectTimeout", "10");
config.addDataSourceProperty("loginTimeout", "10");
// Connection test query
config.setConnectionTestQuery("SELECT 1");
return new HikariDataSource(config);
}
}
Key configuration notes:
- Use the RDS endpoint hostname (e.g.,
mydb.abc123.us-east-1.rds.amazonaws.com), never IP addresses - Connection pools automatically reconnect after failover completes
- Set appropriate timeouts to detect failures quickly
- Test query (
SELECT 1) validates connections before use
For comprehensive connection pooling patterns and transaction management, see Spring Boot Data Access and Database ORM.
Read Replicas
Read replicas provide horizontal scaling for read-heavy workloads by creating asynchronous copies of your database. Applications can distribute read queries across replicas while writes go to the primary instance.
Use cases for read replicas:
- Reporting and analytics queries that don't need real-time data
- Scaling read capacity beyond a single database instance
- Geographic distribution (place replicas in regions close to users)
- Development and testing against production-like data
Read replica characteristics:
- Asynchronous replication: Replicas lag behind primary by seconds to minutes
- Eventually consistent: Reads may return slightly stale data
- Read-only: Applications cannot write to replicas
- Up to 15 replicas per RDS instance (Aurora supports more)
- Cross-region: Replicas can be in different regions for geographic distribution
// Spring Boot configuration with read replica routing
@Configuration
public class ReadReplicaConfig {
@Bean
@Primary
public DataSource routingDataSource(
@Qualifier("primaryDataSource") DataSource primary,
@Qualifier("replicaDataSource") DataSource replica) {
RoutingDataSource routingDataSource = new RoutingDataSource();
Map<Object, Object> targetDataSources = new HashMap<>();
targetDataSources.put(DatabaseType.PRIMARY, primary);
targetDataSources.put(DatabaseType.REPLICA, replica);
routingDataSource.setTargetDataSources(targetDataSources);
routingDataSource.setDefaultTargetDataSource(primary);
return routingDataSource;
}
@Bean
@ConfigurationProperties("rds.primary")
public DataSourceProperties primaryProperties() {
return new DataSourceProperties();
}
@Bean
@ConfigurationProperties("rds.replica")
public DataSourceProperties replicaProperties() {
return new DataSourceProperties();
}
@Bean
public DataSource primaryDataSource() {
return primaryProperties()
.initializeDataSourceBuilder()
.type(HikariDataSource.class)
.build();
}
@Bean
public DataSource replicaDataSource() {
return replicaProperties()
.initializeDataSourceBuilder()
.type(HikariDataSource.class)
.build();
}
}
// Custom annotation to route queries to replicas
@Target({ElementType.METHOD, ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface ReadReplica {
}
// Aspect to intercept @ReadReplica methods
@Aspect
@Component
public class ReadReplicaRoutingAspect {
@Around("@annotation(readReplica)")
public Object routeToReplica(ProceedingJoinPoint joinPoint, ReadReplica readReplica)
throws Throwable {
DatabaseContextHolder.set(DatabaseType.REPLICA);
try {
return joinPoint.proceed();
} finally {
DatabaseContextHolder.clear();
}
}
}
// Service using read replica for queries
@Service
@Transactional(readOnly = true)
public class ReportingService {
private final OrderRepository orderRepository;
@ReadReplica
public List<OrderSummary> generateDailySalesReport(LocalDate date) {
// This query runs against the read replica
return orderRepository.findDailySales(date);
}
}
Replica lag monitoring is critical - applications must handle the eventual consistency model:
@Service
public class ReplicaMonitoringService {
private final CloudWatchClient cloudWatchClient;
private final MeterRegistry meterRegistry;
@Scheduled(fixedRate = 60000) // Every minute
public void checkReplicaLag() {
GetMetricStatisticsResponse response = cloudWatchClient.getMetricStatistics(
GetMetricStatisticsRequest.builder()
.namespace("AWS/RDS")
.metricName("ReplicaLag")
.dimensions(Dimension.builder()
.name("DBInstanceIdentifier")
.value("mydb-replica")
.build())
.startTime(Instant.now().minus(5, ChronoUnit.MINUTES))
.endTime(Instant.now())
.period(300)
.statistics(Statistic.AVERAGE, Statistic.MAXIMUM)
.build()
);
Double avgLag = response.datapoints().stream()
.mapToDouble(Datapoint::average)
.average()
.orElse(0);
Double maxLag = response.datapoints().stream()
.mapToDouble(Datapoint::maximum)
.max()
.orElse(0);
meterRegistry.gauge("rds.replica.lag.average", avgLag);
meterRegistry.gauge("rds.replica.lag.max", maxLag);
// Alert if replica lag exceeds 30 seconds
if (maxLag > 30) {
log.warn("Read replica lag is high: {} seconds", maxLag);
}
}
}
When NOT to use read replicas: If your application requires strong consistency (every read must reflect the most recent write), read replicas are not appropriate. Use Aurora with reader endpoints or scale vertically with a larger instance.
RDS Proxy
RDS Proxy is a fully managed database proxy that pools and shares connections, reducing the load on your database and improving application scalability. This is particularly valuable for serverless applications (Lambda) that create many short-lived connections.
Why RDS Proxy matters:
Problem: Databases have maximum connection limits (PostgreSQL defaults to 100, MySQL to 151). Each connection consumes memory on the database server. Applications that create many connections (especially Lambda functions that can scale to thousands of concurrent invocations) quickly exhaust connection limits, causing connection failures.
Solution: RDS Proxy maintains a connection pool and multiplexes application connections onto a smaller number of database connections. This allows thousands of Lambda functions to share 100 database connections efficiently.
Benefits:
- Improved database efficiency: Reduce connection churn and overhead
- Automatic failover: 66% faster failover than direct connections
- IAM authentication: Eliminate hard-coded credentials in Lambda functions
- Connection reuse: Applications connect quickly to warm connections from the pool
// Spring Boot Lambda function using RDS Proxy with IAM auth
@Component
public class OrderProcessorHandler implements RequestHandler<SQSEvent, Void> {
private final DataSource dataSource;
public OrderProcessorHandler() {
HikariConfig config = new HikariConfig();
// Connect to RDS Proxy endpoint (not directly to RDS)
config.setJdbcUrl("jdbc:postgresql://mydb-proxy.proxy-abc123.us-east-1.rds.amazonaws.com:5432/orders");
// Use IAM authentication (no passwords)
config.setUsername("db_user");
config.addDataSourceProperty("sslmode", "require");
config.addDataSourceProperty("sslrootcert", "rds-ca-2019-root.pem");
// Set AWS RDS IAM authentication plugin
config.addDataSourceProperty("password", generateIAMAuthToken());
// Lambda-specific connection pool settings
config.setMaximumPoolSize(2); // Small pool per Lambda
config.setMinimumIdle(0); // Release connections when idle
config.setConnectionTimeout(5000);
config.setIdleTimeout(60000);
this.dataSource = new HikariDataSource(config);
}
private String generateIAMAuthToken() {
RdsUtilities utilities = RdsUtilities.builder()
.region(Region.US_EAST_1)
.build();
GenerateAuthenticationTokenRequest request = GenerateAuthenticationTokenRequest.builder()
.hostname("mydb-proxy.proxy-abc123.us-east-1.rds.amazonaws.com")
.port(5432)
.username("db_user")
.build();
return utilities.generateAuthenticationToken(request);
}
@Override
public Void handleRequest(SQSEvent event, Context context) {
for (SQSEvent.SQSMessage message : event.getRecords()) {
processOrder(message.getBody());
}
return null;
}
private void processOrder(String orderData) {
try (Connection conn = dataSource.getConnection();
PreparedStatement stmt = conn.prepareStatement(
"INSERT INTO orders (id, customer_id, total, status) VALUES (?, ?, ?, ?)")) {
// Process order...
stmt.executeUpdate();
} catch (SQLException e) {
throw new RuntimeException("Failed to process order", e);
}
}
}
For IAM authentication patterns and security best practices, see AWS IAM and Authentication.
Automated Backups and Point-in-Time Recovery
RDS automatically backs up your database daily during a backup window you specify. Backups are stored in S3 and retained for 1-35 days. Additionally, RDS captures transaction logs every 5 minutes, enabling point-in-time recovery (PITR) to any second within the retention period.
Backup strategy:
- Automated daily backups: Full database snapshot during backup window
- Transaction logs: Continuous backup of transaction logs
- Manual snapshots: User-initiated snapshots retained until explicitly deleted
- Cross-region snapshot copies: Disaster recovery in different regions
// Automated snapshot management
@Service
public class RDSBackupService {
private final RdsClient rdsClient;
// Create manual snapshot before maintenance
public String createManualSnapshot(String dbInstanceId, String reason) {
String snapshotId = String.format("%s-manual-%s-%d",
dbInstanceId, reason, System.currentTimeMillis());
CreateDbSnapshotResponse response = rdsClient.createDBSnapshot(
CreateDbSnapshotRequest.builder()
.dbInstanceIdentifier(dbInstanceId)
.dbSnapshotIdentifier(snapshotId)
.tags(Tag.builder().key("Purpose").value(reason).build())
.build()
);
log.info("Created manual snapshot: {}", response.dbSnapshot().dbSnapshotIdentifier());
return response.dbSnapshot().dbSnapshotIdentifier();
}
// Copy snapshot to another region for disaster recovery
public String copySnapshotToRegion(String snapshotId, String targetRegion) {
String targetSnapshotId = snapshotId + "-" + targetRegion;
CopyDbSnapshotResponse response = rdsClient.copyDBSnapshot(
CopyDbSnapshotRequest.builder()
.sourceDBSnapshotIdentifier(
String.format("arn:aws:rds:us-east-1:123456789012:snapshot:%s", snapshotId))
.targetDBSnapshotIdentifier(targetSnapshotId)
.copyTags(true)
.kmsKeyId("arn:aws:kms:eu-west-1:123456789012:key/abc-123") // Encrypt in target region
.build()
);
log.info("Copying snapshot {} to region {}", snapshotId, targetRegion);
return response.dbSnapshot().dbSnapshotIdentifier();
}
// Cleanup old manual snapshots
@Scheduled(cron = "0 0 3 * * SUN") // Weekly on Sunday at 3 AM
public void cleanupOldSnapshots() {
DescribeDbSnapshotsResponse response = rdsClient.describeDBSnapshots(
DescribeDbSnapshotsRequest.builder()
.snapshotType("manual")
.build()
);
Instant cutoff = Instant.now().minus(90, ChronoUnit.DAYS);
for (DBSnapshot snapshot : response.dbSnapshots()) {
if (snapshot.snapshotCreateTime().isBefore(cutoff)) {
rdsClient.deleteDBSnapshot(
DeleteDbSnapshotRequest.builder()
.dbSnapshotIdentifier(snapshot.dbSnapshotIdentifier())
.build()
);
log.info("Deleted old snapshot: {} from {}",
snapshot.dbSnapshotIdentifier(),
snapshot.snapshotCreateTime());
}
}
}
}
Recovery Point Objective (RPO): With continuous transaction log backups, RDS provides RPO of 5 minutes - you can lose at most 5 minutes of data in a disaster. For zero data loss, use Multi-AZ deployments with synchronous replication.
Recovery Time Objective (RTO): Restoring from a snapshot takes 10-30 minutes depending on database size. Point-in-time recovery requires restoring a snapshot and then replaying transaction logs, adding 5-15 minutes.
For comprehensive backup and disaster recovery strategies, see Disaster Recovery.
Performance Monitoring and Optimization
RDS provides Performance Insights, a tool that visualizes database load and helps identify performance bottlenecks. Performance Insights shows which queries consume the most database time, enabling targeted optimization.
// Monitor RDS performance metrics
@Service
public class RDSPerformanceMonitorService {
private final CloudWatchClient cloudWatchClient;
private final MeterRegistry meterRegistry;
@Scheduled(fixedRate = 60000) // Every minute
public void collectPerformanceMetrics() {
String dbInstanceId = "mydb-production";
Instant endTime = Instant.now();
Instant startTime = endTime.minus(5, ChronoUnit.MINUTES);
// CPU utilization
Double cpuUtilization = getMetric(dbInstanceId, "CPUUtilization", startTime, endTime);
meterRegistry.gauge("rds.cpu.utilization", cpuUtilization);
// Database connections
Double dbConnections = getMetric(dbInstanceId, "DatabaseConnections", startTime, endTime);
meterRegistry.gauge("rds.connections", dbConnections);
// Read/Write IOPS
Double readIOPS = getMetric(dbInstanceId, "ReadIOPS", startTime, endTime);
Double writeIOPS = getMetric(dbInstanceId, "WriteIOPS", startTime, endTime);
meterRegistry.gauge("rds.read.iops", readIOPS);
meterRegistry.gauge("rds.write.iops", writeIOPS);
// Freeable memory
Double freeableMemory = getMetric(dbInstanceId, "FreeableMemory", startTime, endTime);
meterRegistry.gauge("rds.memory.freeable", freeableMemory);
// Alert on high CPU
if (cpuUtilization > 80) {
log.warn("RDS CPU utilization is high: {}%", cpuUtilization);
}
// Alert on connection saturation
int maxConnections = getMaxConnections(dbInstanceId);
if (dbConnections > maxConnections * 0.8) {
log.warn("RDS connections at {}% of maximum ({}/{})",
(dbConnections / maxConnections) * 100, dbConnections, maxConnections);
}
}
private Double getMetric(String dbInstanceId, String metricName,
Instant startTime, Instant endTime) {
GetMetricStatisticsResponse response = cloudWatchClient.getMetricStatistics(
GetMetricStatisticsRequest.builder()
.namespace("AWS/RDS")
.metricName(metricName)
.dimensions(Dimension.builder()
.name("DBInstanceIdentifier")
.value(dbInstanceId)
.build())
.startTime(startTime)
.endTime(endTime)
.period(300)
.statistics(Statistic.AVERAGE)
.build()
);
return response.datapoints().stream()
.mapToDouble(Datapoint::average)
.average()
.orElse(0.0);
}
}
Key RDS performance metrics:
- CPUUtilization: Should stay below 80% under normal load
- DatabaseConnections: Monitor for connection exhaustion
- ReadIOPS / WriteIOPS: Compare against provisioned IOPS limits
- FreeableMemory: Low memory causes disk swapping and performance degradation
- ReplicaLag: Monitor read replica lag for consistency requirements
For comprehensive observability patterns, see Observability and AWS Observability.
Aurora
Aurora is AWS's cloud-native relational database built from the ground up for the cloud. Aurora provides up to 5x the throughput of MySQL and 3x the throughput of PostgreSQL while maintaining compatibility with these engines.
Aurora separates compute and storage - the database engine runs on EC2-like compute instances while storage is a distributed, self-healing layer that automatically scales up to 128 TiB. This architecture enables features impossible with traditional databases: fast cloning, backtrack, and automatic storage scaling.
Why choose Aurora over RDS:
- Performance: 5x MySQL, 3x PostgreSQL throughput
- Availability: 99.99% SLA with Multi-AZ, faster failover (30s typical)
- Scalability: Storage auto-scales in 10 GiB increments, up to 15 read replicas
- Advanced features: Backtrack, fast cloning, Global Database, parallel query
- Cost at scale: Despite 20-30% higher hourly costs, better performance often reduces total cost
Aurora architecture:
Aurora stores 6 copies of data across 3 Availability Zones (2 copies per AZ). Writes require acknowledgment from 4 of 6 copies; reads can be served from any copy. This provides fault tolerance: Aurora can lose 2 copies without affecting write availability and 3 copies without affecting read availability.
Aurora Serverless v2
Aurora Serverless v2 automatically scales database capacity based on application load, eliminating the need to provision specific instance sizes. This is ideal for variable, intermittent, or unpredictable workloads.
How Aurora Serverless v2 works:
- Define minimum and maximum capacity (ACUs - Aurora Capacity Units)
- Aurora monitors database load (CPU, connections, queries)
- Scales up when approaching capacity limits (seconds)
- Scales down during idle periods
- You pay per second for consumed capacity
// Spring Boot with Aurora Serverless - no configuration changes needed
@Configuration
public class AuroraServerlessConfig {
@Bean
public DataSource dataSource(
@Value("${aurora.endpoint}") String endpoint,
@Value("${aurora.database}") String database,
@Value("${aurora.username}") String username,
@Value("${aurora.password}") String password) {
HikariConfig config = new HikariConfig();
// Aurora Serverless endpoint (same as provisioned Aurora)
config.setJdbcUrl(String.format("jdbc:postgresql://%s:5432/%s", endpoint, database));
config.setUsername(username);
config.setPassword(password);
// Connection pool sized for minimum ACUs
// Aurora Serverless scales capacity, not connection limits
config.setMaximumPoolSize(20);
config.setMinimumIdle(2);
// Longer timeouts for scaling events
config.setConnectionTimeout(10000); // 10 seconds
config.setIdleTimeout(300000); // 5 minutes
return new HikariDataSource(config);
}
}
Use cases for Aurora Serverless:
- Development and test databases (scale to zero when not in use)
- Infrequently used applications (scale down to minimum during idle periods)
- Variable workloads (scale up during business hours, down at night)
- New applications with unknown traffic patterns
Cost comparison: Aurora Serverless v2 costs $0.12 per ACU-hour. A 1 ACU minimum configuration costs $2.88/day if running 24/7, but scales to zero during idle periods (you pay only for storage). A db.t4g.medium provisioned instance costs $1.46/day regardless of utilization.
Aurora Global Database
Aurora Global Database enables a single database to span multiple AWS regions with sub-second replication latency. This provides disaster recovery and low-latency reads for globally distributed applications.
Global Database characteristics:
- Primary region: One region hosts the writer instance
- Secondary regions: Up to 5 secondary regions with read replicas
- Replication lag: Typically less than 1 second
- Failover: Promote secondary region to primary in under 1 minute (RTO < 1 minute)
- Read performance: Local reads in secondary regions (low latency)
Use cases for Global Database:
- Global applications requiring low-latency reads in multiple regions
- Disaster recovery with fast RTO (< 1 minute) across regions
- Geographic data distribution for compliance (keep writes in specific regions)
Aurora Advanced Features
Backtrack: Rewind database to a previous point in time without restoring from backup. Useful for recovering from application errors (accidental DELETE, bad schema migration). Backtrack is instant - no restore delay.
Fast Cloning: Create a copy of an Aurora database in minutes without copying data. Clones use copy-on-write, sharing storage with the source database until data diverges. Perfect for creating test environments from production data.
Parallel Query: Offload analytical queries to Aurora storage layer, enabling faster execution for queries scanning millions of rows. Parallel query is transparent - no application changes required.
// Using Aurora advanced features
@Service
public class AuroraManagementService {
private final RdsClient rdsClient;
// Create fast clone for testing
public String createTestClone(String sourceClusterId) {
String cloneId = sourceClusterId + "-test-" + System.currentTimeMillis();
RestoreDbClusterToPointInTimeResponse response = rdsClient.restoreDBClusterToPointInTime(
RestoreDbClusterToPointInTimeRequest.builder()
.sourceDBClusterIdentifier(sourceClusterId)
.dbClusterIdentifier(cloneId)
.restoreType("copy-on-write") // Fast clone
.useLatestRestorableTime(true)
.build()
);
log.info("Created fast clone: {} from {}", cloneId, sourceClusterId);
return response.dbCluster().dbClusterIdentifier();
}
// Backtrack to recover from error
public void backtrackToTimestamp(String clusterId, Instant targetTime) {
BacktrackDbClusterResponse response = rdsClient.backtrackDBCluster(
BacktrackDbClusterRequest.builder()
.dbClusterIdentifier(clusterId)
.backtrackTo(targetTime)
.build()
);
log.info("Backtracked cluster {} to {}", clusterId, targetTime);
}
}
For comprehensive database design patterns applicable to Aurora, see Database Design and Database Migrations.
DynamoDB
DynamoDB is a fully managed NoSQL database providing single-digit millisecond performance at any scale. Unlike relational databases, DynamoDB is schemaless (except for keys), scales horizontally without limits, and optimizes for simple key-value and document access patterns.
DynamoDB's architecture is fundamentally different from relational databases. Data is distributed across partitions based on partition keys, enabling massive parallelism. However, this means complex queries (joins, aggregations across items) are not supported - you must design your data model around your access patterns.
Data Modeling Principles
DynamoDB data modeling is reverse from relational design: you design your schema based on how you'll query the data, not based on entity relationships. Poor partition key design leads to performance bottlenecks; good partition key design enables unlimited scale.
Key concepts:
Partition Key (PK): Required for every item. DynamoDB distributes items across partitions based on a hash of the partition key. Items with the same partition key are stored together and can be queried efficiently.
Sort Key (SK): Optional. Within a partition, items are sorted by sort key. This enables range queries and hierarchical data modeling.
Composite Keys: PK + SK together form the item's primary key. This enables patterns like "all orders for customer X" or "all products in category Y sorted by price."
Good partition keys have:
- High cardinality: Many unique values (user IDs, order IDs)
- Even distribution: No hot partitions (don't use "country" if 80% of users are in one country)
- Predictable access: You know the partition key for your queries
Bad partition keys:
- Low cardinality: "status" with only 3 values (active, inactive, pending)
- Uneven distribution: "date" where current date gets 90% of writes
- Unpredictable: Can't efficiently query "find all users who purchased product X"
// DynamoDB table design for e-commerce orders
@DynamoDbBean
public class OrderItem {
// Partition key: Customer ID for customer-centric access
private String pk; // PK: USER#12345
// Sort key: Hierarchical pattern for different entity types
private String sk; // SK: ORDER#2024-01-15#abc123 or ORDER#ITEM#xyz789
private String entityType; // "ORDER" or "ORDER_ITEM"
private String orderId;
private String orderDate;
private Double totalAmount;
private String status;
// Item attributes
private String productId;
private Integer quantity;
private Double price;
// GSI keys for alternate access patterns
private String gsi1pk; // ORDER#abc123 (query by order ID)
private String gsi1sk; // ITEM#xyz789
@DynamoDbPartitionKey
@DynamoDbAttribute("PK")
public String getPk() {
return pk;
}
@DynamoDbSortKey
@DynamoDbAttribute("SK")
public String getSk() {
return sk;
}
// Additional getters and setters...
}
// Service for DynamoDB operations
@Service
public class OrderDynamoDBService {
private final DynamoDbClient dynamoDbClient;
private final DynamoDbEnhancedClient enhancedClient;
private static final String TABLE_NAME = "Orders";
// Query all orders for a customer
public List<OrderItem> getCustomerOrders(String customerId) {
DynamoDbTable<OrderItem> table = enhancedClient.table(TABLE_NAME,
TableSchema.fromBean(OrderItem.class));
QueryConditional condition = QueryConditional
.keyEqualTo(Key.builder()
.partitionValue("USER#" + customerId)
.sortValue("ORDER#")
.build())
.and(QueryConditional.sortBeginsWith(
Key.builder().sortValue("ORDER#").build()));
return table.query(condition).items().stream().toList();
}
// Get specific order with items
public Order getOrderWithItems(String customerId, String orderId) {
DynamoDbTable<OrderItem> table = enhancedClient.table(TABLE_NAME,
TableSchema.fromBean(OrderItem.class));
// Query returns order header and all order items in one request
QueryConditional condition = QueryConditional
.keyEqualTo(Key.builder()
.partitionValue("USER#" + customerId)
.sortValue("ORDER#" + orderId)
.build());
List<OrderItem> results = table.query(condition).items().stream().toList();
// First item is order header, rest are order items
OrderItem orderHeader = results.get(0);
List<OrderItem> items = results.subList(1, results.size());
return new Order(orderHeader, items);
}
// Write order with items in transaction
public void createOrder(String customerId, Order order) {
List<TransactWriteItem> actions = new ArrayList<>();
// Order header
actions.add(TransactWriteItem.builder()
.put(Put.builder()
.tableName(TABLE_NAME)
.item(Map.of(
"PK", AttributeValue.builder().s("USER#" + customerId).build(),
"SK", AttributeValue.builder().s("ORDER#" + order.getOrderId()).build(),
"EntityType", AttributeValue.builder().s("ORDER").build(),
"TotalAmount", AttributeValue.builder().n(order.getTotal().toString()).build(),
"Status", AttributeValue.builder().s(order.getStatus()).build()
))
.build())
.build());
// Order items
for (OrderLineItem item : order.getItems()) {
actions.add(TransactWriteItem.builder()
.put(Put.builder()
.tableName(TABLE_NAME)
.item(Map.of(
"PK", AttributeValue.builder().s("USER#" + customerId).build(),
"SK", AttributeValue.builder().s("ORDER#" + order.getOrderId() + "#ITEM#" + item.getId()).build(),
"EntityType", AttributeValue.builder().s("ORDER_ITEM").build(),
"ProductId", AttributeValue.builder().s(item.getProductId()).build(),
"Quantity", AttributeValue.builder().n(String.valueOf(item.getQuantity())).build(),
"Price", AttributeValue.builder().n(item.getPrice().toString()).build()
))
.build())
.build());
}
// Execute transaction (all-or-nothing)
dynamoDbClient.transactWriteItems(TransactWriteItemsRequest.builder()
.transactItems(actions)
.build());
}
}
Single-table design: Advanced DynamoDB pattern where multiple entity types share one table. This enables retrieving related entities in a single query (e.g., order + items) and reduces costs compared to multiple tables. However, it requires careful key design and is not always appropriate.
Global Secondary Indexes (GSI)
GSIs provide alternate query patterns by defining different partition and sort keys. Without GSIs, you can only query by the table's primary key. With GSIs, you can query by alternate attributes.
GSI characteristics:
- Eventually consistent: GSIs lag behind the main table by milliseconds
- Projected attributes: Choose which attributes to include (all, keys-only, or specific)
- Separate capacity: GSIs have their own read/write capacity provisioned separately
- Sparse indexes: Only items with the GSI keys are included
// Querying by GSI for alternate access pattern
public List<OrderItem> getOrderById(String orderId) {
// GSI allows querying by order ID instead of customer ID
DynamoDbTable<OrderItem> table = enhancedClient.table(TABLE_NAME,
TableSchema.fromBean(OrderItem.class));
// Query GSI1 where GSI1PK = ORDER#abc123
QueryConditional condition = QueryConditional
.keyEqualTo(Key.builder()
.partitionValue("ORDER#" + orderId)
.build());
return table.index("GSI1")
.query(condition)
.items()
.stream()
.toList();
}
When to use GSIs:
- You need to query by attributes other than the primary key
- You need multiple access patterns on the same data
- Your query patterns don't match your partition key design
GSI limitations:
- Maximum 20 GSIs per table
- GSIs increase storage costs (each GSI stores copies of projected attributes)
- GSIs consume write capacity for every write to the main table
DynamoDB Capacity Modes
DynamoDB offers two capacity modes with different billing and scaling characteristics:
On-Demand: Pay per request with automatic scaling:
- No capacity planning required
- Instantly accommodates traffic spikes
- $1.25 per million writes, $0.25 per million reads (can be expensive at high volume)
- Ideal for unpredictable workloads, new applications, development/testing
Provisioned: Pre-provision read and write capacity units:
- $0.00065 per write capacity unit-hour, $0.00013 per read capacity unit-hour (much cheaper at scale)
- Auto-scaling adjusts capacity based on utilization
- Reserved capacity available for further discounts
- Ideal for predictable workloads with sustained traffic
// Monitor DynamoDB capacity utilization
@Service
public class DynamoDBCapacityMonitorService {
private final CloudWatchClient cloudWatchClient;
private final MeterRegistry meterRegistry;
@Scheduled(fixedRate = 60000) // Every minute
public void monitorCapacity() {
String tableName = "Orders";
Instant endTime = Instant.now();
Instant startTime = endTime.minus(5, ChronoUnit.MINUTES);
// Consumed read capacity
Double readCapacity = getMetric(tableName, "ConsumedReadCapacityUnits", startTime, endTime);
meterRegistry.gauge("dynamodb.read.consumed", readCapacity);
// Consumed write capacity
Double writeCapacity = getMetric(tableName, "ConsumedWriteCapacityUnits", startTime, endTime);
meterRegistry.gauge("dynamodb.write.consumed", writeCapacity);
// Throttled requests
Double throttledReads = getMetric(tableName, "ReadThrottleEvents", startTime, endTime);
Double throttledWrites = getMetric(tableName, "WriteThrottleEvents", startTime, endTime);
if (throttledReads > 0 || throttledWrites > 0) {
log.warn("DynamoDB throttling detected: {} read throttles, {} write throttles",
throttledReads, throttledWrites);
}
}
}
Capacity mode selection: Use On-Demand for unpredictable traffic or if you're unsure of capacity needs. Switch to Provisioned once traffic patterns stabilize - at steady-state, Provisioned is 5-10x cheaper than On-Demand.
DynamoDB Streams and Global Tables
DynamoDB Streams capture change data (inserts, updates, deletes) for real-time processing. Streams enable event-driven architectures, data replication, and audit logging.
// Lambda function processing DynamoDB Stream
@Component
public class OrderStreamHandler implements RequestHandler<DynamodbEvent, Void> {
private final OrderNotificationService notificationService;
@Override
public Void handleRequest(DynamodbEvent event, Context context) {
for (DynamodbEvent.DynamodbStreamRecord record : event.getRecords()) {
if (record.getEventName().equals("INSERT")) {
Map<String, AttributeValue> newImage = record.getDynamodb().getNewImage();
String entityType = newImage.get("EntityType").getS();
if (entityType.equals("ORDER")) {
String customerId = newImage.get("PK").getS().replace("USER#", "");
String orderId = newImage.get("SK").getS().replace("ORDER#", "");
// Send order confirmation
notificationService.sendOrderConfirmation(customerId, orderId);
}
}
}
return null;
}
}
Global Tables provide multi-region, multi-active replication for disaster recovery and low-latency global access. Writes in any region replicate to all other regions, typically within one second.
For event-driven architectures with DynamoDB Streams, see Event-Driven Architecture and AWS Messaging.
ElastiCache
ElastiCache provides managed in-memory caching with Redis or Memcached, delivering sub-millisecond latency for frequently accessed data. Caching reduces database load, improves response times, and enables applications to scale beyond database capacity limits.
Redis vs Memcached
Redis - Advanced data structures with persistence:
- Data structures: Strings, hashes, lists, sets, sorted sets, bitmaps, HyperLogLog, geospatial indexes
- Persistence: Optional disk persistence (RDB snapshots, AOF logs)
- Replication: Primary-replica replication with automatic failover
- Pub/Sub: Message broker capabilities
- Atomic operations: Complex atomic operations (increment counters, set operations)
- Use cases: Session storage, leaderboards, real-time analytics, distributed locks, rate limiting
Memcached - Simple, high-performance key-value cache:
- Data structures: Key-value pairs only (strings)
- No persistence: Cache is ephemeral; data lost on restart
- Multi-threaded: Better CPU utilization on multi-core instances
- Simplicity: Simpler mental model, easier to operate
- Use cases: Simple caching, database query results, API response caching
Recommendation: Use Redis unless you have a specific reason for Memcached. Redis provides more features while maintaining excellent performance. Memcached's simpler model rarely outweighs Redis's additional capabilities.
ElastiCache Redis Configuration
// Spring Boot Redis configuration with ElastiCache
@Configuration
@EnableCaching
public class RedisCacheConfig {
@Value("${elasticache.redis.endpoint}")
private String redisEndpoint;
@Value("${elasticache.redis.port:6379}")
private int redisPort;
@Bean
public LettuceConnectionFactory redisConnectionFactory() {
// Cluster mode configuration
RedisClusterConfiguration clusterConfig = new RedisClusterConfiguration()
.clusterNode(redisEndpoint, redisPort);
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.commandTimeout(Duration.ofSeconds(2))
.shutdownTimeout(Duration.ZERO)
.build();
return new LettuceConnectionFactory(clusterConfig, clientConfig);
}
@Bean
public RedisTemplate<String, Object> redisTemplate(
LettuceConnectionFactory connectionFactory) {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(connectionFactory);
// JSON serialization for values
Jackson2JsonRedisSerializer<Object> serializer =
new Jackson2JsonRedisSerializer<>(Object.class);
ObjectMapper mapper = new ObjectMapper();
mapper.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
mapper.activateDefaultTyping(
mapper.getPolymorphicTypeValidator(),
ObjectMapper.DefaultTyping.NON_FINAL);
serializer.setObjectMapper(mapper);
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(serializer);
template.setHashKeySerializer(new StringRedisSerializer());
template.setHashValueSerializer(serializer);
template.afterPropertiesSet();
return template;
}
@Bean
public CacheManager cacheManager(LettuceConnectionFactory connectionFactory) {
RedisCacheConfiguration config = RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofHours(1)) // 1 hour TTL
.disableCachingNullValues()
.serializeKeysWith(RedisSerializationContext.SerializationPair
.fromSerializer(new StringRedisSerializer()))
.serializeValuesWith(RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer()));
return RedisCacheManager.builder(connectionFactory)
.cacheDefaults(config)
.build();
}
}
// Using Redis for caching
@Service
public class ProductService {
private final ProductRepository productRepository;
private final RedisTemplate<String, Object> redisTemplate;
@Cacheable(value = "products", key = "#productId")
public Product getProduct(String productId) {
// Cache miss - fetch from database
return productRepository.findById(productId)
.orElseThrow(() -> new ProductNotFoundException(productId));
}
@CacheEvict(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
// Evict cache before updating
return productRepository.save(product);
}
// Manual cache operations for complex scenarios
public List<Product> getPopularProducts() {
String cacheKey = "products:popular";
// Try cache first
List<Product> cached = (List<Product>) redisTemplate.opsForValue().get(cacheKey);
if (cached != null) {
return cached;
}
// Cache miss - compute and cache
List<Product> products = productRepository.findPopularProducts();
redisTemplate.opsForValue().set(cacheKey, products, Duration.ofMinutes(15));
return products;
}
// Distributed lock with Redis
public void updateInventory(String productId, int quantity) {
String lockKey = "lock:product:" + productId;
// Acquire distributed lock
Boolean acquired = redisTemplate.opsForValue()
.setIfAbsent(lockKey, "locked", Duration.ofSeconds(10));
if (Boolean.TRUE.equals(acquired)) {
try {
// Critical section - update inventory
Product product = getProduct(productId);
product.setInventory(product.getInventory() + quantity);
updateProduct(product);
} finally {
// Release lock
redisTemplate.delete(lockKey);
}
} else {
throw new ConcurrentModificationException("Could not acquire lock for product " + productId);
}
}
}
For comprehensive caching strategies and patterns, see Caching.
Redis Cluster Mode
Redis Cluster mode distributes data across multiple shards for horizontal scaling and high availability. Each shard has a primary node and replica nodes for failover.
Cluster mode benefits:
- Horizontal scaling: Distribute data across multiple shards (up to 500 nodes)
- High availability: Automatic failover within shards
- Partitioning: Data automatically distributed using hash slots
Cluster mode trade-offs:
- Cannot use multi-key operations across shards
- Lua scripts must operate on keys in the same shard
- Slightly higher latency due to redirection for misrouted requests
// Monitor ElastiCache Redis performance
@Service
public class RedisCacheMonitorService {
private final CloudWatchClient cloudWatchClient;
private final MeterRegistry meterRegistry;
@Scheduled(fixedRate = 60000) // Every minute
public void monitorRedisMetrics() {
String cacheClusterId = "my-redis-cluster";
Instant endTime = Instant.now();
Instant startTime = endTime.minus(5, ChronoUnit.MINUTES);
// Cache hit rate
Double cacheHits = getMetric(cacheClusterId, "CacheHits", startTime, endTime);
Double cacheMisses = getMetric(cacheClusterId, "CacheMisses", startTime, endTime);
double hitRate = (cacheHits + cacheMisses) > 0
? (cacheHits / (cacheHits + cacheMisses)) * 100
: 0;
meterRegistry.gauge("redis.cache.hit.rate", hitRate);
// Memory utilization
Double memoryUsed = getMetric(cacheClusterId, "DatabaseMemoryUsagePercentage", startTime, endTime);
meterRegistry.gauge("redis.memory.utilization", memoryUsed);
// Evictions
Double evictions = getMetric(cacheClusterId, "Evictions", startTime, endTime);
meterRegistry.gauge("redis.evictions", evictions);
// Alert on low hit rate
if (hitRate < 80) {
log.warn("Redis cache hit rate is low: {:.1f}%", hitRate);
}
// Alert on high evictions
if (evictions > 1000) {
log.warn("Redis is evicting items: {} evictions in last 5 minutes", evictions);
}
// Alert on memory pressure
if (memoryUsed > 85) {
log.warn("Redis memory utilization is high: {:.1f}%", memoryUsed);
}
}
}
Database Migration Strategies
Migrating databases to AWS requires careful planning to minimize downtime and ensure data integrity. AWS provides tools and services to simplify migration.
AWS Database Migration Service (DMS)
DMS migrates databases to AWS with minimal downtime, supporting homogeneous migrations (Oracle to RDS Oracle) and heterogeneous migrations (Oracle to Aurora PostgreSQL).
DMS migration phases:
- Full load: Copy all existing data from source to target
- Change data capture (CDC): Continuously replicate ongoing changes during migration
- Cutover: Switch applications to target database after catching up
Zero-downtime migration strategy:
- Set up DMS replication from source to target
- Full load completes (database fully copied)
- CDC replicates ongoing changes (source and target in sync)
- Monitor replication lag until it's minimal (<1 second)
- Cutover: redirect applications to target database
- Monitor for issues; can fail back to source if needed
For schema migration patterns and version control, see Database Migrations.
Schema Conversion Tool (SCT)
For heterogeneous migrations (e.g., Oracle to PostgreSQL), SCT converts schemas, stored procedures, and application code to be compatible with the target database.
SCT process:
- Analyze source database schema and code
- Generate conversion report showing compatibility and required changes
- Automatically convert compatible objects
- Flag incompatible objects requiring manual conversion
- Generate converted schema for target database
Common conversion challenges:
- Proprietary features (Oracle packages, SQL Server T-SQL)
- Different data types (Oracle NUMBER vs PostgreSQL NUMERIC)
- Stored procedures requiring rewrite
- Application SQL that depends on database-specific features
Cost Optimization
Database costs can be significant. Implementing cost optimization strategies reduces expenses without sacrificing performance or availability.
RDS/Aurora cost optimization:
- Reserved Instances: 1-year or 3-year commitments for 30-60% discounts
- Right-size instances: Monitor CPU/memory usage and downsize overprovisioned instances
- Delete unused snapshots: Snapshots accumulate; implement retention policies
- Aurora Serverless: Variable workloads benefit from auto-scaling capacity
- Stop non-production databases: Stop dev/test databases outside business hours
DynamoDB cost optimization:
- On-Demand to Provisioned: Switch to provisioned capacity once traffic stabilizes (5-10x cheaper)
- Auto-scaling: Automatically adjust provisioned capacity based on utilization
- Delete unused GSIs: Each GSI increases storage and write costs
- Enable TTL: Automatically delete expired items to reduce storage costs
- Compress large items: DynamoDB charges per KB; compress data before storing
ElastiCache cost optimization:
- Reserved nodes: 1-year or 3-year commitments for 30-50% discounts
- Right-size nodes: Start with smaller nodes and scale up based on metrics
- Disable snapshots for ephemeral data: Snapshots cost money; disable for truly ephemeral caches
- Use Memcached for simple caching: Memcached is cheaper than Redis when advanced features aren't needed
// Automate RDS instance rightsizing recommendations
@Service
public class RDSCostOptimizationService {
private final CloudWatchClient cloudWatchClient;
private final RdsClient rdsClient;
public List<RightsizingRecommendation> analyzeRDSUtilization() {
List<RightsizingRecommendation> recommendations = new ArrayList<>();
// Get all RDS instances
DescribeDbInstancesResponse instances = rdsClient.describeDBInstances();
for (DBInstance instance : instances.dbInstances()) {
String instanceId = instance.dbInstanceIdentifier();
String currentClass = instance.dbInstanceClass();
// Check average CPU over last 7 days
double avgCpu = getAverageCPU(instanceId, Duration.ofDays(7));
// Recommend downsizing if CPU is consistently low
if (avgCpu < 20) {
String recommendedClass = suggestSmallerInstance(currentClass);
double monthlySavings = calculateSavings(currentClass, recommendedClass);
recommendations.add(new RightsizingRecommendation(
instanceId,
currentClass,
recommendedClass,
avgCpu,
monthlySavings,
"CPU utilization is consistently low; consider downsizing instance"
));
}
}
return recommendations;
}
private double getAverageCPU(String instanceId, Duration lookback) {
Instant endTime = Instant.now();
Instant startTime = endTime.minus(lookback.getSeconds(), ChronoUnit.SECONDS);
GetMetricStatisticsResponse response = cloudWatchClient.getMetricStatistics(
GetMetricStatisticsRequest.builder()
.namespace("AWS/RDS")
.metricName("CPUUtilization")
.dimensions(Dimension.builder()
.name("DBInstanceIdentifier")
.value(instanceId)
.build())
.startTime(startTime)
.endTime(endTime)
.period(86400) // Daily
.statistics(Statistic.AVERAGE)
.build()
);
return response.datapoints().stream()
.mapToDouble(Datapoint::average)
.average()
.orElse(0);
}
}
For comprehensive cost optimization strategies across all AWS services, see AWS Cost Optimization.
Anti-Patterns
Single-AZ RDS in production: Single-AZ databases have no automatic failover. Hardware failures or AZ outages cause hours of downtime. Always use Multi-AZ for production.
Not using connection pooling: Opening a new database connection for every request is slow and exhausts connection limits. Use connection pools (HikariCP) to reuse connections efficiently.
Poor DynamoDB partition key design: Low-cardinality partition keys or uneven distributions create hot partitions, limiting throughput. Design partition keys for high cardinality and even distribution.
Querying entire table without indexes: Full table scans are slow and expensive. Create appropriate indexes (RDS/Aurora) or GSIs (DynamoDB) for query patterns.
Not monitoring database performance: Database issues often manifest gradually. Monitor CPU, connections, IOPS, and query performance to identify problems before they impact users.
Using DynamoDB for complex queries: DynamoDB doesn't support joins or complex aggregations. If your access patterns require these, use a relational database instead.
Over-caching: Caching everything wastes memory and introduces staleness. Cache only frequently accessed, read-heavy data. Writes should update or invalidate caches immediately.
Not testing failover: Multi-AZ and Aurora failover work in theory, but applications must handle connection failures gracefully. Test failover scenarios before production incidents.
Ignoring replication lag: Read replicas and DynamoDB Streams have replication lag. Applications must handle eventual consistency - don't assume writes are immediately visible on replicas.
Manual database operations: Use automation (DMS, SCT, Terraform, CloudFormation) for migrations, backups, and configuration changes. Manual operations are error-prone and don't scale.
Related Guidelines
- Database Design - Schema design, normalization, indexing strategies
- Database ORM - ORM patterns, JPA/Hibernate best practices
- Database Migrations - Schema evolution, Flyway, Liquibase
- Caching - Caching strategies, invalidation patterns
- Spring Boot Data Access - Transaction management, connection pooling
- AWS Networking - VPC configuration, database subnet groups
- AWS IAM - Database authentication with IAM
- AWS Observability - Monitoring database performance
- AWS Storage - Storage for database backups
- Event-Driven Architecture - Using DynamoDB Streams
- Disaster Recovery - Backup and recovery strategies