Route 53 and DNS
Amazon Route 53 is AWS's highly available and scalable Domain Name System (DNS) web service. It provides domain registration, DNS routing, health checking, and traffic management capabilities that are essential for building resilient, globally distributed applications.
Overview
Route 53 derives its name from the DNS port (TCP/UDP port 53). The service combines three key capabilities: domain registration for purchasing and managing domains, DNS management for translating domain names to IP addresses or other resources, and traffic routing for directing users to the best endpoint based on configurable policies.
Understanding DNS fundamentals is crucial before leveraging Route 53's advanced features. DNS operates as a hierarchical distributed naming system, where queries traverse from root name servers through top-level domain (TLD) servers to authoritative name servers. Route 53 acts as an authoritative DNS server for your domains, answering queries with authoritative responses based on the records you configure.
Core Principles
Global availability: Route 53 is designed for 100% availability using a globally distributed network of DNS servers across multiple AWS regions and edge locations. This ensures DNS queries are answered even during regional outages.
Health-based routing: Integrate health checks with routing policies to automatically route traffic away from unhealthy endpoints. This provides automated failover without manual intervention, critical for maintaining application availability.
Performance optimization: Use routing policies that consider geographic location, latency, or endpoint weights to direct users to the most appropriate endpoint. This improves user experience by reducing response times.
Security integration: Leverage DNSSEC for DNS response authentication, private hosted zones for internal DNS resolution, and integration with AWS services to prevent exposure of internal infrastructure.
Cost-effective DNS management: Route 53 charges per hosted zone and per query, making it important to consolidate hosted zones where possible and use efficient routing policies to minimize query costs. Understanding DNS caching behavior (TTL settings) helps balance cost and agility.
DNS Fundamentals
DNS translates human-readable domain names (like api.example.com) into machine-readable IP addresses (like 203.0.113.42) or other resources. When a client queries a domain name, the DNS resolver follows a resolution path: checking local cache, querying recursive resolvers, then authoritative name servers.
Route 53 operates as an authoritative DNS server, meaning it provides definitive answers for domains you control. When you create a hosted zone for example.com, AWS assigns four name servers (e.g., ns-123.awsdns-45.com) that become the authoritative source for all DNS records within that zone.
Time to Live (TTL) is a critical DNS concept that determines how long DNS resolvers cache a record before querying Route 53 again. A TTL of 300 seconds (5 minutes) means clients cache the response for 5 minutes, reducing query volume but delaying propagation of DNS changes. Balance caching benefits against the need for rapid updates when setting TTLs.
DNS supports multiple record types, each serving a specific purpose:
- A records map domain names to IPv4 addresses
- AAAA records map to IPv6 addresses
- CNAME records create aliases to other domain names (cannot be used at the zone apex)
- MX records define mail servers for email delivery
- TXT records store arbitrary text data, often used for domain verification and SPF/DKIM email authentication
- SRV records specify service locations for specific protocols
Hosted Zones
A hosted zone is a container for DNS records that defines how to route traffic for a specific domain and its subdomains. Route 53 supports two types of hosted zones: public and private.
Public Hosted Zones
Public hosted zones respond to DNS queries from the internet. When you create a public hosted zone for example.com, Route 53 assigns four name servers. You must update your domain registrar to use these name servers, delegating DNS authority to Route 53.
Each public hosted zone costs $0.50/month, plus query charges. Consolidate multiple domains into fewer hosted zones where possible using subdomains. For example, rather than creating separate hosted zones for api.example.com and app.example.com, manage both as records within the example.com hosted zone.
# Terraform: Creating a public hosted zone
resource "aws_route53_zone" "public" {
name = "example.com"
comment = "Public DNS zone for example.com"
tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
# Output the name servers for domain registrar configuration
output "name_servers" {
value = aws_route53_zone.public.name_servers
description = "Configure these at your domain registrar"
}
After creating a hosted zone, you must configure your domain registrar to use Route 53's name servers. This delegation typically takes 24-48 hours to fully propagate globally, though changes often appear within minutes. During this propagation period, some users may receive responses from the old DNS provider and others from Route 53.
Private Hosted Zones
Private hosted zones respond only to DNS queries from within specified VPCs, enabling internal DNS resolution for resources that shouldn't be publicly accessible. This is essential for microservices architectures where services communicate via internal DNS names rather than IP addresses.
Private hosted zones work by associating the zone with one or more VPCs. DNS queries from EC2 instances, Lambda functions, and other resources within those VPCs receive answers from the private zone, while queries from outside the VPC receive no response (or different public zone responses if a public zone exists for the same domain).
# Terraform: Private hosted zone for internal services
resource "aws_route53_zone" "private" {
name = "internal.example.com"
comment = "Private DNS for VPC resources"
vpc {
vpc_id = aws_vpc.main.id
}
tags = {
Environment = "production"
Visibility = "private"
}
}
# Associate with additional VPCs as needed
resource "aws_route53_zone_association" "secondary" {
zone_id = aws_route53_zone.private.id
vpc_id = aws_vpc.secondary.id
}
# Internal service record
resource "aws_route53_record" "payment_service" {
zone_id = aws_route53_zone.private.id
name = "payment-api.internal.example.com"
type = "A"
ttl = 60
records = [aws_instance.payment_api.private_ip]
}
A common pattern is split-view DNS, where the same domain name (api.example.com) exists in both public and private hosted zones. External clients resolve to public IP addresses or load balancers, while internal VPC resources resolve to private IP addresses or internal load balancers. This improves security by keeping internal IPs private and can reduce data transfer costs by avoiding internet gateways.
For cross-account VPC associations, first authorize the association in the zone account, then create the association from the VPC account. This enables centralized DNS management across multiple AWS accounts. See our VPC networking guide for multi-account VPC patterns.
DNS Record Types and Configuration
Route 53 supports standard DNS record types plus AWS-specific alias records that provide enhanced functionality for AWS resources.
A and AAAA Records
A records map domain names to IPv4 addresses, while AAAA records map to IPv6 addresses. These are the fundamental record types for directing traffic to web servers, application servers, and other networked resources.
# Simple A record pointing to an EC2 instance
resource "aws_route53_record" "web_server" {
zone_id = aws_route53_zone.public.id
name = "www.example.com"
type = "A"
ttl = 300
records = [aws_instance.web.public_ip]
}
# IPv6 AAAA record
resource "aws_route53_record" "web_server_ipv6" {
zone_id = aws_route53_zone.public.id
name = "www.example.com"
type = "AAAA"
ttl = 300
records = [aws_instance.web.ipv6_address]
}
TTL selection involves tradeoffs: lower TTLs (60-300 seconds) enable faster DNS changes but increase query volume and costs. Higher TTLs (3600+ seconds) reduce costs and query load but slow down infrastructure changes. Use lower TTLs during deployments or migrations, then increase them once changes are stable.
CNAME Records
CNAME (Canonical Name) records create aliases from one domain name to another. The DNS resolver follows CNAME records recursively until reaching an A or AAAA record. CNAMEs cannot be created at the zone apex (the root domain like example.com), only for subdomains.
# CNAME pointing subdomain to another domain
resource "aws_route53_record" "blog" {
zone_id = aws_route53_zone.public.id
name = "blog.example.com"
type = "CNAME"
ttl = 300
records = ["example.hashnode.dev"]
}
A CNAME to www.example.com pointing to example.hashnode.dev means clients first query www.example.com, receive the CNAME response, then query example.hashnode.dev to get the final IP address. This double resolution adds latency compared to direct A records, but provides flexibility when the underlying resource may change.
Alias Records
Alias records are an AWS extension to standard DNS that overcome CNAME limitations. Unlike CNAMEs, alias records can be created at the zone apex and do not add resolution latency. Route 53 automatically resolves alias records to the current IP addresses of the target resource.
Alias records integrate seamlessly with AWS services like CloudFront distributions, Application Load Balancers, API Gateway endpoints, and S3 website buckets. Queries to alias records are free (no query charge), unlike standard record queries.
# Alias record at zone apex pointing to ALB
resource "aws_route53_record" "apex" {
zone_id = aws_route53_zone.public.id
name = "example.com"
type = "A"
alias {
name = aws_lb.main.dns_name
zone_id = aws_lb.main.zone_id
evaluate_target_health = true
}
}
# Alias record for CloudFront distribution
resource "aws_route53_record" "cdn" {
zone_id = aws_route53_zone.public.id
name = "cdn.example.com"
type = "A"
alias {
name = aws_cloudfront_distribution.main.domain_name
zone_id = aws_cloudfront_distribution.main.hosted_zone_id
evaluate_target_health = false
}
}
The evaluate_target_health parameter enables automatic health-based routing. When set to true, Route 53 considers the target resource's health checks before responding to queries. If the target is unhealthy, Route 53 won't return the alias record, enabling failover to healthy alternatives when used with routing policies.
Prefer alias records over CNAMEs for AWS resources. Alias records reduce query latency (single lookup instead of two), eliminate query charges, and work at the zone apex. Only use CNAMEs when pointing to external non-AWS resources or when backwards compatibility with older DNS resolvers requires it.
TXT Records
TXT records store arbitrary text data, commonly used for domain ownership verification, email authentication (SPF, DKIM, DMARC), and security policies.
# SPF record for email authentication
resource "aws_route53_record" "spf" {
zone_id = aws_route53_zone.public.id
name = "example.com"
type = "TXT"
ttl = 3600
records = ["v=spf1 include:_spf.google.com ~all"]
}
# Domain verification for third-party services
resource "aws_route53_record" "verification" {
zone_id = aws_route53_zone.public.id
name = "_github-challenge.example.com"
type = "TXT"
ttl = 300
records = ["a7b8c9d0e1f2"]
}
When a TXT record contains multiple strings, DNS protocols require each string be under 255 characters. Route 53 automatically handles splitting longer values, but be aware that client libraries may concatenate or separate these strings differently.
Routing Policies
Routing policies determine how Route 53 responds to DNS queries. Unlike simple DNS servers that always return the same answer, Route 53 can route traffic intelligently based on health, geography, latency, or weights.
Simple Routing
Simple routing returns all values for a record in random order. Use simple routing when you have a single resource serving traffic or when you want basic round-robin behavior without health checks.
# Simple routing to multiple IP addresses
resource "aws_route53_record" "simple" {
zone_id = aws_route53_zone.public.id
name = "app.example.com"
type = "A"
ttl = 60
records = [
"203.0.113.1",
"203.0.113.2",
"203.0.113.3"
]
}
When a record has multiple values, Route 53 returns all values in the response. The client (usually the application's DNS resolver) selects one, often randomly or round-robin. This provides basic load distribution but no health checking - unhealthy endpoints continue receiving traffic.
Weighted Routing
Weighted routing distributes traffic across multiple resources according to specified weights. This enables gradual traffic shifting for blue-green deployments, canary releases, or A/B testing.
Weights can range from 0 to 255. Route 53 calculates the probability of selecting each record as weight / sum_of_all_weights. A weight of 0 prevents traffic except when all other records are unhealthy (when health checks are enabled).
# Canary deployment: 5% to new version, 95% to stable
resource "aws_route53_record" "stable" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
weighted_routing_policy {
weight = 95
}
set_identifier = "stable-v1.2.3"
records = [aws_lb.stable.dns_name]
health_check_id = aws_route53_health_check.stable.id
}
resource "aws_route53_record" "canary" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
weighted_routing_policy {
weight = 5
}
set_identifier = "canary-v1.3.0"
records = [aws_lb.canary.dns_name]
health_check_id = aws_route53_health_check.canary.id
}
Each record requires a unique set_identifier to distinguish between records with the same name and type. This identifier is metadata for Route 53 and never appears in DNS responses.
During a canary deployment, gradually increase the canary weight and decrease the stable weight: start with 5/95, then 10/90, 25/75, 50/50, and finally 100/0 once confidence is established. The low TTL (60 seconds) ensures traffic distribution updates quickly as weights change, though some clients may cache longer depending on resolver behavior.
Weighted routing excels for gradual rollouts where you need control over traffic distribution percentages. For geographic routing or latency optimization, consider geolocation or latency-based policies instead. See our deployment strategies guide for integration with CI/CD pipelines.
Latency-Based Routing
Latency-based routing directs users to the AWS region that provides the lowest network latency. Route 53 measures latency between users and AWS regions, then routes traffic to the region with the best performance for that user's location.
# Multi-region deployment with latency-based routing
resource "aws_route53_record" "us_east" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
latency_routing_policy {
region = "us-east-1"
}
set_identifier = "us-east-1"
alias {
name = aws_lb.us_east.dns_name
zone_id = aws_lb.us_east.zone_id
evaluate_target_health = true
}
}
resource "aws_route53_record" "eu_west" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
latency_routing_policy {
region = "eu-west-1"
}
set_identifier = "eu-west-1"
alias {
name = aws_lb.eu_west.dns_name
zone_id = aws_lb.eu_west.zone_id
evaluate_target_health = true
}
}
resource "aws_route53_record" "ap_southeast" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
latency_routing_policy {
region = "ap-southeast-2"
}
set_identifier = "ap-southeast-2"
alias {
name = aws_lb.ap_southeast.dns_name
zone_id = aws_lb.ap_southeast.zone_id
evaluate_target_health = true
}
}
Route 53 maintains a continuously updated database of latency measurements between user locations and AWS regions. When a query arrives, Route 53 determines the user's approximate location based on their DNS resolver's IP address, then returns the record for the region with historically lowest latency to that location.
Latency-based routing differs from geolocation routing: latency routing optimizes for actual network performance, while geolocation routing enforces specific geographic mappings. A user in Australia might be routed to a Singapore region via latency routing if Singapore provides better network performance, whereas geolocation routing would enforce Australia-specific routing regardless of performance.
Combine latency routing with health checks to automatically fail over to the next-best region when the lowest-latency region becomes unhealthy. This provides both performance optimization and resilience. For multi-region architecture patterns, see our cell-based architecture guide.
Failover Routing
Failover routing creates active-passive configurations where traffic routes to a primary resource when healthy, and automatically fails over to a secondary resource when the primary is unhealthy.
# Primary active resource
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
failover_routing_policy {
type = "PRIMARY"
}
set_identifier = "primary-us-east-1"
alias {
name = aws_lb.primary.dns_name
zone_id = aws_lb.primary.zone_id
evaluate_target_health = true
}
health_check_id = aws_route53_health_check.primary.id
}
# Secondary passive resource
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
failover_routing_policy {
type = "SECONDARY"
}
set_identifier = "secondary-us-west-2"
alias {
name = aws_lb.secondary.dns_name
zone_id = aws_lb.secondary.zone_id
evaluate_target_health = true
}
health_check_id = aws_route53_health_check.secondary.id
}
Route 53 continuously monitors the primary endpoint's health check. When the health check succeeds, all traffic routes to the primary. When the health check fails, Route 53 immediately starts returning the secondary record. Once the primary recovers and health checks pass, traffic automatically returns to the primary.
Failover routing requires explicit health checks on the primary record. Without a health check, Route 53 always considers the primary healthy and never fails over. The secondary record can optionally have a health check; if both are unhealthy, Route 53 returns the primary record on the assumption that DNS should remain available even when no healthy endpoints exist.
The TTL value significantly impacts failover speed. With a 60-second TTL, clients may continue using cached primary DNS responses for up to 60 seconds after Route 53 detects a failure. Some resolvers ignore TTLs or cache longer, potentially delaying failover. Balance rapid failover (low TTL) against DNS query volume and costs (higher with low TTL).
For multi-region disaster recovery, combine failover routing with latency or geolocation routing in a nested configuration. For example, use latency routing within each region, then failover between regions. See disaster recovery patterns for comprehensive DR strategies.
Geolocation Routing
Geolocation routing routes traffic based on the geographic location of users. You can specify routing by continent, country, or US state. This enables localization (serving region-specific content), compliance with data residency requirements, or traffic balancing across regions.
# Europe routes to EU region
resource "aws_route53_record" "europe" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
geolocation_routing_policy {
continent = "EU"
}
set_identifier = "europe"
alias {
name = aws_lb.eu_west.dns_name
zone_id = aws_lb.eu_west.zone_id
evaluate_target_health = true
}
}
# Specific country overrides continent
resource "aws_route53_record" "germany" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
geolocation_routing_policy {
country = "DE"
}
set_identifier = "germany"
alias {
name = aws_lb.eu_central.dns_name
zone_id = aws_lb.eu_central.zone_id
evaluate_target_health = true
}
}
# Default record for unmatched locations
resource "aws_route53_record" "default" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
geolocation_routing_policy {
country = "*"
}
set_identifier = "default"
alias {
name = aws_lb.us_east.dns_name
zone_id = aws_lb.us_east.zone_id
evaluate_target_health = true
}
}
Route 53 evaluates geolocation records from most specific to least specific: US state → country → continent → default. A user in Bavaria, Germany matches the Germany country record (not the Europe continent record). A user in France matches the Europe continent record. A user in Australia matches the default record.
Always create a default geolocation record (country = "*") to handle locations without specific rules. Without a default record, users from unmatched locations receive a "no answer" DNS response, effectively making your application inaccessible to them.
Route 53 determines user location from the DNS resolver's IP address, not the end user's actual IP. Users behind VPNs or using public DNS providers (Google DNS, Cloudflare DNS) may appear to originate from different locations than their actual location. For applications requiring precise geographic control, consider implementing application-level geolocation detection and routing.
Geolocation routing suits compliance scenarios where data must remain in specific jurisdictions. For example, ensure European users only access EU-based infrastructure to comply with GDPR data residency requirements. For performance optimization without geographic constraints, latency-based routing is typically more effective.
Geoproximity Routing
Geoproximity routing routes traffic based on the geographic proximity of users to resources, with optional bias to shift traffic toward or away from specific locations. This enables sophisticated traffic engineering when you want to prefer certain regions while still considering proximity.
Geoproximity requires using Route 53 Traffic Flow, a visual editor for complex routing policies. You define resource coordinates (AWS region or latitude/longitude) and an optional bias value (-99 to +99) that expands or contracts the geographic area routed to each resource.
A bias of +25 expands the geographic catchment area for a resource, routing users who are farther away to it. A bias of -25 contracts the area, routing only nearby users. Biases enable gradual traffic shifting by region: start with negative bias on a new region, then increase toward positive to progressively route more users as confidence grows.
# Note: Geoproximity requires Traffic Flow
# This is conceptual; use AWS Console or Traffic Flow API
resource "aws_route53_traffic_policy" "geoproximity" {
name = "geoproximity-policy"
document = jsonencode({
AWSPolicyFormatVersion = "2015-10-01"
RecordType = "A"
Endpoints = {
us_east = {
Type = "elastic-load-balancer"
Value = aws_lb.us_east.dns_name
}
eu_west = {
Type = "elastic-load-balancer"
Value = aws_lb.eu_west.dns_name
}
}
Rules = [{
GeoproximityLocation = [{
EndpointReference = "us_east"
Region = "us-east-1"
Bias = 10
}, {
EndpointReference = "eu_west"
Region = "eu-west-1"
Bias = -10
}]
}]
})
}
Geoproximity offers finer control than geolocation routing (which enforces hard geographic boundaries) but more structure than latency routing (which purely optimizes for performance). Use geoproximity when you want proximity-based routing with traffic engineering capabilities, such as preferring certain regions for cost or capacity reasons while still routing based on geography.
Multivalue Answer Routing
Multivalue answer routing returns up to eight randomly selected healthy records in response to queries. Unlike simple routing, multivalue routing performs health checks and excludes unhealthy records from responses.
resource "aws_route53_record" "multivalue_1" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
multivalue_answer_routing_policy = true
set_identifier = "server-1"
records = [aws_instance.server_1.public_ip]
health_check_id = aws_route53_health_check.server_1.id
}
resource "aws_route53_record" "multivalue_2" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
multivalue_answer_routing_policy = true
set_identifier = "server-2"
records = [aws_instance.server_2.public_ip]
health_check_id = aws_route53_health_check.server_2.id
}
resource "aws_route53_record" "multivalue_3" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
ttl = 60
multivalue_answer_routing_policy = true
set_identifier = "server-3"
records = [aws_instance.server_3.public_ip]
health_check_id = aws_route53_health_check.server_3.id
}
When queried, Route 53 evaluates all records, filters out unhealthy ones, then randomly selects up to eight to return. The client chooses one from the returned set. This provides basic load distribution with health checking but less control than weighted routing or advanced policies.
Multivalue routing suits scenarios where you have multiple equivalent endpoints and want health-aware responses without the complexity of weighted or geographic routing. For production applications requiring sophisticated traffic management, use weighted, latency, or failover routing instead.
Health Checks and Monitoring
Health checks enable Route 53 to make routing decisions based on endpoint health. Route 53 health checkers distributed globally send requests to your endpoints at regular intervals, evaluating whether endpoints meet your health criteria.
Endpoint Health Checks
Endpoint health checks send requests directly to endpoints via HTTP, HTTPS, or TCP. You configure the protocol, IP address or domain name, port, path (for HTTP/HTTPS), and intervals.
# HTTP health check
resource "aws_route53_health_check" "api" {
type = "HTTP"
resource_path = "/health"
fqdn = "api.example.com"
port = 80
request_interval = 30
failure_threshold = 3
tags = {
Name = "api-health-check"
}
}
# HTTPS health check with string matching
resource "aws_route53_health_check" "api_secure" {
type = "HTTPS"
resource_path = "/health"
fqdn = "api.example.com"
port = 443
request_interval = 30
failure_threshold = 3
search_string = "\"status\":\"healthy\""
measure_latency = true
tags = {
Name = "api-https-health-check"
}
}
# TCP health check
resource "aws_route53_health_check" "database" {
type = "TCP"
ip_address = aws_instance.database.public_ip
port = 5432
request_interval = 30
failure_threshold = 3
tags = {
Name = "database-tcp-health-check"
}
}
The request_interval can be 30 seconds (standard) or 10 seconds (fast, higher cost). Route 53 health checkers distributed globally all send requests at this interval, so your endpoint receives multiple requests per interval.
The failure_threshold determines how many consecutive checks must fail before marking the endpoint unhealthy (range: 1-10). A threshold of 3 with a 30-second interval means the endpoint must fail checks for 90 seconds before being marked unhealthy. Balance sensitivity (low threshold) against avoiding false positives from transient network issues (higher threshold).
For HTTP/HTTPS checks, Route 53 considers status codes 200-299 as healthy. The optional search_string parameter validates response body content, ensuring the endpoint not only responds but returns expected data. Use this to detect scenarios where a server returns 200 OK but the application behind it is degraded.
Enable measure_latency to track endpoint response times in CloudWatch. This provides visibility into performance trends and helps identify degrading endpoints before they fail health checks completely.
Calculated Health Checks
Calculated health checks combine multiple other health checks using Boolean logic (AND, OR, NOT). This enables complex health logic like "healthy if at least 2 of 3 backend servers are healthy."
resource "aws_route53_health_check" "calculated" {
type = "CALCULATED"
child_health_threshold = 2
child_healthchecks = [
aws_route53_health_check.backend_1.id,
aws_route53_health_check.backend_2.id,
aws_route53_health_check.backend_3.id,
]
tags = {
Name = "calculated-backend-health"
}
}
The child_health_threshold specifies how many child health checks must pass for the calculated check to be healthy. With three children and a threshold of 2, the calculated check is healthy if any two children are healthy. This provides resilience against single endpoint failures while detecting widespread failures.
Calculated health checks enable hierarchical health models. For example, create endpoint health checks for each server in a region, a calculated health check per region, then a top-level calculated check across regions. Route this through failover routing to automatically fail over entire regions when insufficient endpoints are healthy.
CloudWatch Alarm Health Checks
CloudWatch alarm health checks monitor CloudWatch alarms, marking the health check unhealthy when the alarm is in ALARM state. This integrates custom application metrics into Route 53 routing decisions.
# CloudWatch alarm for high error rate
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
alarm_name = "api-high-error-rate"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "5XXError"
namespace = "AWS/ApplicationELB"
period = 60
statistic = "Sum"
threshold = 10
alarm_description = "Triggers when API error rate exceeds threshold"
dimensions = {
LoadBalancer = aws_lb.main.arn_suffix
}
}
# Health check based on CloudWatch alarm
resource "aws_route53_health_check" "alarm_based" {
type = "CLOUDWATCH_METRIC"
cloudwatch_alarm_name = aws_cloudwatch_metric_alarm.high_error_rate.alarm_name
cloudwatch_alarm_region = "us-east-1"
insufficient_data_health_status = "Healthy"
tags = {
Name = "alarm-based-health-check"
}
}
The insufficient_data_health_status parameter determines health status when the CloudWatch alarm lacks sufficient data (during initial deployment or metrics gaps). Setting this to "Healthy" avoids false negatives during normal operations; set to "Unhealthy" for conservative health checks that fail-safe.
CloudWatch alarm health checks enable sophisticated application-aware routing. Monitor business metrics (transaction success rate, API latency percentiles, database connection pool saturation) and automatically route traffic away from degraded instances. This goes beyond simple endpoint availability to consider application health.
For comprehensive CloudWatch integration patterns, see our observability guide and Spring Boot observability.
Health Check Best Practices
Design health check endpoints to validate the entire request path, not just server availability. A health endpoint should verify database connectivity, external API availability, and critical dependencies. However, keep health checks lightweight to avoid performance impact from constant polling.
// Spring Boot health check endpoint
@RestController
public class HealthCheckController {
private final DataSource dataSource;
private final ExternalService externalService;
@GetMapping("/health")
public ResponseEntity<HealthStatus> healthCheck() {
try {
// Verify database connectivity (quick query)
dataSource.getConnection().close();
// Verify critical external services
boolean externalHealthy = externalService.isHealthy();
if (!externalHealthy) {
return ResponseEntity.status(503)
.body(new HealthStatus("unhealthy", "external service unavailable"));
}
return ResponseEntity.ok(new HealthStatus("healthy", "all systems operational"));
} catch (Exception e) {
return ResponseEntity.status(503)
.body(new HealthStatus("unhealthy", e.getMessage()));
}
}
}
Set failure thresholds to balance sensitivity and stability. A threshold of 1 triggers failover on single transient failures; a threshold of 5-6 may leave unhealthy endpoints in rotation too long. Start with 3 for 30-second intervals or 6 for 10-second intervals.
Use separate health check endpoints for different purposes: /health for Route 53 DNS failover (checks critical functionality), /ready for Kubernetes readiness probes (checks startup completion), /live for liveness probes (checks basic process health). See our Spring Boot health checks and Kubernetes patterns for details.
Monitor health check metrics in CloudWatch. Track the HealthCheckStatus metric (1 = healthy, 0 = unhealthy) and create alarms when health checks fail, enabling proactive investigation before DNS failover occurs.
Custom Domain Names and SSL/TLS
Custom domain names provide branded, memorable URLs for your applications rather than AWS-generated endpoints. Route 53 integrates seamlessly with AWS Certificate Manager (ACM) for SSL/TLS certificate provisioning and renewal.
Certificate Management
Request certificates through ACM, using DNS validation with Route 53 for automated certificate approval and renewal. ACM certificates are free for use with AWS services.
# Request ACM certificate with DNS validation
resource "aws_acm_certificate" "main" {
domain_name = "example.com"
subject_alternative_names = ["*.example.com"]
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
tags = {
Environment = "production"
}
}
# Automatically create Route 53 validation records
resource "aws_route53_record" "cert_validation" {
for_each = {
for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = aws_route53_zone.public.id
}
# Wait for certificate validation
resource "aws_acm_certificate_validation" "main" {
certificate_arn = aws_acm_certificate.main.arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
ACM automatically renews certificates before expiration. With DNS validation, renewal is fully automated: ACM creates new validation records, Route 53 serves them, ACM verifies ownership, and issues the renewed certificate without manual intervention.
Use wildcard certificates (*.example.com) to cover all subdomains with a single certificate. This simplifies certificate management when you have many subdomains but slightly increases risk if a single wildcard certificate is compromised.
For CloudFront distributions, request certificates in us-east-1 regardless of where your primary resources reside. CloudFront requires certificates in this specific region. For regional services (ALB, API Gateway regional endpoints), request certificates in the same region as the service.
Custom Domains for API Gateway
API Gateway supports custom domain names to replace default invoke URLs with branded domains.
# Custom domain for API Gateway
resource "aws_api_gateway_domain_name" "api" {
domain_name = "api.example.com"
regional_certificate_arn = aws_acm_certificate.main.arn
endpoint_configuration {
types = ["REGIONAL"]
}
}
# Map custom domain to API stage
resource "aws_api_gateway_base_path_mapping" "api" {
api_id = aws_api_gateway_rest_api.main.id
stage_name = aws_api_gateway_stage.prod.stage_name
domain_name = aws_api_gateway_domain_name.api.domain_name
}
# Route 53 alias record pointing to custom domain
resource "aws_route53_record" "api" {
zone_id = aws_route53_zone.public.id
name = aws_api_gateway_domain_name.api.domain_name
type = "A"
alias {
name = aws_api_gateway_domain_name.api.regional_domain_name
zone_id = aws_api_gateway_domain_name.api.regional_zone_id
evaluate_target_health = false
}
}
For edge-optimized API Gateway endpoints, use edge-optimized custom domains with CloudFront. For regional endpoints co-located with your primary infrastructure, use regional custom domains for lower latency and simpler network paths. See our API Gateway guide for details.
DNSSEC
DNSSEC (DNS Security Extensions) adds cryptographic signatures to DNS responses, enabling clients to verify that responses haven't been tampered with during transit. This prevents DNS spoofing and cache poisoning attacks.
Route 53 supports DNSSEC for domain registration and for hosted zones (signing your DNS responses). Enable DNSSEC when your security requirements mandate authenticated DNS responses or when regulatory compliance demands it.
# Enable DNSSEC signing for hosted zone
resource "aws_route53_hosted_zone_dnssec" "main" {
hosted_zone_id = aws_route53_zone.public.id
}
# Retrieve KSK (Key Signing Key) for domain registrar
data "aws_route53_dnssec_key_signing_key" "main" {
hosted_zone_id = aws_route53_zone.public.id
depends_on = [aws_route53_hosted_zone_dnssec.main]
}
After enabling DNSSEC in Route 53, configure DS (Delegation Signer) records at your domain registrar using the KSK public key. This establishes the chain of trust from the parent zone to your Route 53 hosted zone.
DNSSEC adds complexity and potential failure modes: if DNSSEC configuration is incorrect, all DNS queries fail. Thoroughly test DNSSEC in non-production environments before enabling in production. Monitor DNSSEC validation metrics to detect configuration issues.
Most applications don't require DNSSEC. The additional complexity and operational overhead are justified for high-security environments, financial services, or government systems. Standard TLS/HTTPS encryption protects data in transit without DNSSEC complexity for most use cases.
Cross-Account DNS Patterns
Multi-account AWS architectures often centralize DNS management in a networking or shared services account while applications run in separate accounts. Route 53 supports several cross-account patterns.
Shared Hosted Zones
Associate a private hosted zone in one account with VPCs in other accounts. This enables centralized DNS management with distributed infrastructure.
# In DNS account: Create private hosted zone
resource "aws_route53_zone" "shared" {
name = "internal.example.com"
vpc {
vpc_id = var.dns_account_vpc_id
}
}
# Authorize VPC association from application account
resource "aws_route53_vpc_association_authorization" "app_account" {
vpc_id = var.app_account_vpc_id
zone_id = aws_route53_zone.shared.id
}
# In application account: Associate VPC with shared zone
resource "aws_route53_zone_association" "app_vpc" {
vpc_id = var.app_account_vpc_id
zone_id = var.shared_zone_id
}
Authorization must be created in the zone account first, then the association in the VPC account. This prevents unauthorized accounts from associating VPCs with zones they shouldn't access.
Cross-Account Record Management
Use IAM roles to allow application accounts to create records in centralized hosted zones. This maintains centralized zone management while enabling teams to self-service DNS records.
# In DNS account: IAM role for cross-account record management
resource "aws_iam_role" "dns_manager" {
name = "cross-account-dns-manager"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.app_account_id}:root"
}
Action = "sts:AssumeRole"
Condition = {
StringEquals = {
"sts:ExternalId" = var.external_id
}
}
}]
})
}
resource "aws_iam_role_policy" "dns_manager" {
role = aws_iam_role.dns_manager.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"route53:ChangeResourceRecordSets",
"route53:GetHostedZone",
"route53:ListResourceRecordSets"
]
Resource = "arn:aws:route53:::hostedzone/${aws_route53_zone.public.id}"
}]
})
}
Application accounts assume this role to manage specific records within the shared zone. Implement resource-level permissions to restrict which record names each account can modify, preventing cross-team conflicts.
For comprehensive multi-account architecture patterns, see our cell-based architecture and networking guides.
DNS Query Logging and Monitoring
Route 53 query logging captures detailed information about DNS queries Route 53 receives. Enable query logging to analyze traffic patterns, troubleshoot DNS issues, detect anomalies, or meet compliance requirements.
# CloudWatch Log Group for query logs
resource "aws_cloudwatch_log_group" "dns_queries" {
name = "/aws/route53/${aws_route53_zone.public.name}"
retention_in_days = 30
tags = {
Environment = "production"
}
}
# IAM policy for Route 53 to write to CloudWatch
data "aws_iam_policy_document" "route53_query_logging" {
statement {
actions = [
"logs:CreateLogStream",
"logs:PutLogEvents",
]
resources = ["${aws_cloudwatch_log_group.dns_queries.arn}:*"]
principals {
type = "Service"
identifiers = ["route53.amazonaws.com"]
}
}
}
resource "aws_cloudwatch_log_resource_policy" "route53_query_logging" {
policy_name = "route53-query-logging"
policy_document = data.aws_iam_policy_document.route53_query_logging.json
}
# Enable query logging
resource "aws_route53_query_log" "main" {
depends_on = [aws_cloudwatch_log_resource_policy.route53_query_logging]
cloudwatch_log_group_arn = aws_cloudwatch_log_group.dns_queries.arn
zone_id = aws_route53_zone.public.id
}
Query logs include timestamp, query name and type, response code, edge location, resolver IP, and more. Analyze logs to identify unexpected query patterns, diagnose resolution issues, or detect potential DNS-based attacks.
Query logging adds costs: CloudWatch Logs ingestion and storage charges apply based on log volume. High-traffic domains generate substantial log data. Use log retention policies to balance visibility needs with costs, retaining recent logs (7-30 days) and archiving or deleting older data.
Create CloudWatch Insights queries to analyze patterns:
# Most queried domains
fields @timestamp, query_name
| stats count() as query_count by query_name
| sort query_count desc
| limit 20
# Query errors and failures
fields @timestamp, query_name, response_code
| filter response_code != "NOERROR"
| stats count() as error_count by response_code, query_name
# Geographic distribution
fields @timestamp, resolver_ip, edge_location
| stats count() as query_count by edge_location
| sort query_count desc
Integrate query logs with security monitoring to detect DNS tunneling, DGA (domain generation algorithm) patterns, or reconnaissance activities. See our security monitoring guide for security integration patterns.
Cost Optimization
Route 53 pricing has three components: hosted zone costs ($0.50/month per zone), query costs ($0.40 per million queries for standard queries, $0.60 per million for latency/geolocation/geoproximity), and health check costs ($0.50/month per health check).
Hosted Zone Consolidation
Consolidate domains into fewer hosted zones where possible. Rather than creating separate zones for api.example.com, app.example.com, and cdn.example.com, manage all as records within the example.com zone.
# Instead of three separate zones:
# - api.example.com ($0.50/month)
# - app.example.com ($0.50/month)
# - cdn.example.com ($0.50/month)
# Use one zone with subdomains:
resource "aws_route53_zone" "consolidated" {
name = "example.com" # $0.50/month total
}
resource "aws_route53_record" "api" {
zone_id = aws_route53_zone.consolidated.id
name = "api.example.com"
# ...
}
resource "aws_route53_record" "app" {
zone_id = aws_route53_zone.consolidated.id
name = "app.example.com"
# ...
}
This reduces costs from $1.50/month to $0.50/month for the example above. Scale this across dozens of domains for meaningful savings.
Query Cost Optimization
Alias records to AWS resources (ALB, CloudFront, API Gateway, S3 website) are free - Route 53 doesn't charge for queries. Use alias records instead of CNAME or A records pointing to AWS resources.
# Free: Alias record to ALB
resource "aws_route53_record" "free_alias" {
zone_id = aws_route53_zone.public.id
name = "app.example.com"
type = "A"
alias {
name = aws_lb.main.dns_name
zone_id = aws_lb.main.zone_id
evaluate_target_health = true
}
# No query charges for alias records
}
# Paid: A record to IP address
resource "aws_route53_record" "paid_a_record" {
zone_id = aws_route53_zone.public.id
name = "legacy.example.com"
type = "A"
ttl = 300
records = ["203.0.113.1"]
# $0.40 per million queries
}
Higher TTLs reduce query volume by encouraging longer caching. A TTL of 3600 (1 hour) generates fewer queries than a TTL of 60 (1 minute), reducing costs. Balance this against the need for rapid DNS changes during deployments or incidents.
Standard routing policies cost $0.40 per million queries. Latency, geolocation, geoproximity, and failover routing cost $0.60 per million queries. Use standard or weighted routing when geographic or latency optimization isn't required.
Health Check Optimization
Each health check costs $0.50/month (standard 30-second interval) or $1.00/month (fast 10-second interval). Calculate health checks optimize costs by combining multiple endpoint checks into one calculated check.
# Expensive: Individual health checks on 6 endpoints
# 6 endpoints × $0.50 = $3.00/month
# More efficient: Calculated health check
# 6 endpoint checks + 1 calculated check = 7 × $0.50 = $3.50/month
# But use the calculated check in routing, getting combined logic
# Better: Use ALB health checks instead of Route 53
# ALB health checks are free
# Create Route 53 health check only on the ALB
# 1 health check × $0.50 = $0.50/month
resource "aws_route53_record" "optimized" {
zone_id = aws_route53_zone.public.id
name = "api.example.com"
type = "A"
alias {
name = aws_lb.main.dns_name
zone_id = aws_lb.main.zone_id
evaluate_target_health = true # Uses ALB health checks
}
# No separate Route 53 health check needed
}
For endpoints behind load balancers, enable evaluate_target_health = true on alias records to leverage the load balancer's existing health checks rather than creating separate Route 53 health checks. This eliminates health check costs for those endpoints.
Use CloudWatch alarm health checks to monitor aggregated metrics rather than individual endpoints. One alarm covering multiple instances costs the same as an alarm for a single instance, reducing total health check count.
Anti-Patterns and Common Mistakes
Using CNAME at zone apex: CNAMEs cannot exist at the zone apex (example.com). Use alias records instead, which work at the apex and provide better performance and cost characteristics.
Not creating default geolocation records: When using geolocation routing without a default record, users from unmatched locations receive no DNS response. Always create a default geolocation record (country = "*").
Overly low TTLs: Setting TTL to very low values (5-10 seconds) dramatically increases query volume and costs while providing minimal benefit. DNS caching occurs at multiple layers; some resolvers ignore TTLs below certain thresholds. Use 60-300 seconds for most use cases.
Ignoring health check failures: Route 53 health checks fail for a reason. Don't ignore failing health checks or disable them to "fix" routing issues. Investigate and resolve the underlying endpoint health problems. See our observability practices for monitoring patterns.
Public S3 buckets instead of CloudFront + OAI: Serving content directly from S3 using public bucket policies bypasses CloudFront's caching, DDoS protection, and performance benefits. Use CloudFront distributions with Origin Access Identity and Route 53 alias records to CloudFront. See our file storage guide for details.
Not using alias records for AWS resources: Pointing A/AAAA records to AWS resource IP addresses requires manual updates when infrastructure changes. Alias records automatically track resource IPs and incur no query charges.
Hardcoding AWS resource endpoints in applications: Applications that hardcode xxx.cloudfront.net or xxx.elb.amazonaws.com lose flexibility and branding. Use custom domains with Route 53 for all user-facing endpoints.
Not monitoring DNS query patterns: Without query logging or CloudWatch metrics, you lack visibility into DNS behavior during incidents. Enable query logging for critical domains and create CloudWatch dashboards for DNS metrics.
Neglecting DNSSEC complexity: Enabling DNSSEC without understanding the operational implications can cause outages when keys need rotation or configuration changes. Start with non-critical domains to gain operational experience before enabling DNSSEC for production systems.
Manual DNS changes during incidents: Making ad-hoc DNS changes via console during incidents introduces errors and lacks audit trails. Use Infrastructure as Code (Terraform) for all DNS changes, even during incidents, to maintain change history and enable rapid rollback. See our infrastructure as code guide for patterns.
Integration with Other AWS Services
Route 53 integrates deeply with AWS services to provide seamless DNS management:
-
CloudFront: Use alias records to point custom domains to CloudFront distributions for CDN-backed content delivery. See our CloudFront guide.
-
Application Load Balancer: Alias records to ALBs enable automatic DNS updates as instances scale. Enable
evaluate_target_healthto leverage ALB health checks. See our networking guide. -
API Gateway: Custom domain names replace default invoke URLs with branded domains. Route 53 alias records point to API Gateway regional or edge-optimized endpoints. See our API Gateway guide.
-
S3 Static Websites: Alias records point to S3 website endpoints. Prefer CloudFront + S3 over direct S3 website hosting for better performance and security. See our file storage guide.
-
EKS and Kubernetes: Use Route 53 with ExternalDNS controller for automatic DNS record creation from Kubernetes Ingress and Service resources. See our EKS guide and Kubernetes guide.
-
Certificate Manager: Automated DNS validation enables hands-free certificate issuance and renewal. ACM certificates are free when used with AWS services.
-
CloudWatch: Query logs, health check metrics, and alarm-based health checks integrate Route 53 with observability infrastructure. See our observability guide.
Further Reading
AWS Documentation:
- Route 53 Developer Guide - Comprehensive official documentation
- Routing Policy Reference - Detailed routing policy comparison
- Health Check Documentation - Health check configuration and best practices
- DNSSEC in Route 53 - DNSSEC setup and management
DNS Standards:
- RFC 1035 - Domain Names - Implementation and Specification (foundational DNS standard)
- RFC 4034 - DNSSEC Resource Records
- DNS Query Name Minimisation (RFC 7816) - Privacy considerations for DNS
Related Documentation:
- AWS Networking - VPC design, load balancers, private DNS
- CloudFront CDN - Content delivery network integration
- API Gateway - Custom domains for APIs
- EKS DNS - Kubernetes DNS integration
- Cell-Based Architecture - Multi-region routing strategies
- Infrastructure as Code - Managing DNS with Terraform
- Observability - DNS monitoring and logging