AWS VPC and Networking

AWS Virtual Private Cloud (VPC) provides isolated network environments for your cloud resources. VPC design fundamentally impacts application security, availability, performance, and cost. This guide covers VPC architecture patterns, routing, security controls, load balancing, and private connectivity for production systems.

Understanding VPC networking is essential for building secure, scalable applications. Poor network design creates security vulnerabilities, availability failures, and operational complexity.

Production Readiness

VPC design decisions are difficult to change after deployment. Plan your CIDR ranges, subnet strategy, and routing architecture carefully before creating resources.

VPC Fundamentals

What is a VPC?

A Virtual Private Cloud (VPC) is a logically isolated network within AWS. Each VPC has:

CIDR block (IP address range) defining available IP addresses
Subnets subdividing the VPC CIDR into smaller network segments
Route tables controlling traffic routing between subnets and external networks
Internet Gateway (IGW) enabling internet connectivity
Security Groups and Network ACLs providing firewall controls

VPCs are region-scoped - resources in different regions require separate VPCs (or VPC peering/Transit Gateway for connectivity). Within a region, subnets are availability zone-specific, enabling multi-AZ architectures for high availability.

CIDR Planning

CIDR (Classless Inter-Domain Routing) notation specifies IP address ranges using the format 10.0.0.0/16, where /16 indicates the network prefix length (number of fixed bits in the address).

Common CIDR blocks for VPCs:

/16 (e.g., 10.0.0.0/16) - 65,536 IP addresses - suitable for large production VPCs
/20 (e.g., 10.0.0.0/20) - 4,096 IP addresses - suitable for medium environments
/24 (e.g., 10.0.0.0/24) - 256 IP addresses - suitable for small dev/test environments

AWS reserves 5 IP addresses per subnet (first 4 and last 1), so a /24 subnet actually provides 251 usable IPs.

CIDR Planning Best Practices

Plan for growth. Start with a /16 VPC even if you only need a few hundred IPs initially. You cannot expand VPC CIDR blocks beyond the original range (though you can add secondary CIDR blocks, which adds complexity).

Avoid overlapping CIDR blocks. If you plan to connect VPCs via peering or Transit Gateway, ensure VPC CIDR blocks don't overlap. Overlapping ranges prevent network connectivity.

Use private IP ranges. Always use RFC 1918 private address space:

10.0.0.0/8 (10.0.0.0 - 10.255.255.255)
172.16.0.0/12 (172.16.0.0 - 172.31.255.255)
192.168.0.0/16 (192.168.0.0 - 192.168.255.255)

Standardize across environments. Use consistent CIDR patterns across dev/staging/production. Example:

Dev: 10.0.0.0/16
Staging: 10.1.0.0/16
Production: 10.2.0.0/16

This predictability simplifies troubleshooting and infrastructure-as-code templates.

Reserve space for future VPCs. If you use 10.0.0.0/16 for production today, reserve 10.1.0.0/16, 10.2.0.0/16, etc. for future expansion or additional VPCs.

The diagram illustrates a multi-tier VPC with three subnet types across two availability zones. Public subnets host resources requiring internet access (load balancers, bastion hosts), private subnets host application servers, and data subnets host databases. This pattern provides fault tolerance - if one AZ fails, resources in the other AZ continue operating.

Subnet Design Patterns

Subnets partition your VPC CIDR into smaller network segments. Each subnet exists in exactly one availability zone and has its own route table and network ACL.

Public Subnets

Public subnets have a route to an Internet Gateway, enabling resources to communicate directly with the internet. Resources in public subnets can have public IP addresses.

Use cases:

Application Load Balancers (ALB) and Network Load Balancers (NLB)
Bastion hosts for SSH/RDP access
NAT Gateways (enabling private subnet internet access)
Public-facing web servers (though private subnets with load balancers are preferred)

Route table for public subnet:

Destination	Target
10.0.0.0/16	local (VPC routing)
0.0.0.0/0	igw-xxxxx (Internet Gateway)

The 0.0.0.0/0 route directs all non-VPC traffic to the internet gateway.

Private Subnets

Private subnets do not have direct internet routes. Resources in private subnets can initiate outbound internet connections via a NAT Gateway (placed in a public subnet), but they cannot receive inbound connections from the internet.

Use cases:

Application servers (EC2, ECS, EKS nodes)
Microservices
Internal services without internet exposure requirements

Route table for private subnet:

Destination	Target
10.0.0.0/16	local
0.0.0.0/0	nat-xxxxx (NAT Gateway in public subnet)

Private subnets provide defense-in-depth - even if application vulnerabilities exist, attackers cannot reach servers without first compromising a public-facing component (like a load balancer).

Data/Isolated Subnets

Data subnets (sometimes called isolated subnets) have no internet routes - neither direct (IGW) nor indirect (NAT Gateway). Resources in data subnets can only communicate within the VPC or via VPC endpoints.

Use cases:

Databases (RDS, Aurora, Redshift)
ElastiCache clusters
Highly sensitive data processing

Route table for data subnet:

Destination	Target
10.0.0.0/16	local

Data subnets minimize attack surface by eliminating internet connectivity entirely. Databases in data subnets cannot be compromised via internet-facing vulnerabilities. Application servers in private subnets access databases via VPC-local routing.

Multi-Tier Architecture Pattern

Production VPCs typically use a multi-tier architecture with three subnet types across multiple availability zones:

This architecture isolates traffic at each tier. Internet traffic reaches only the load balancer, application servers are shielded behind the ALB, and databases are isolated from the internet entirely. Each tier has distinct security group rules controlling permitted traffic. See microservices architecture for service-to-service communication patterns within this structure.

Routing and Gateways

VPC routing controls how traffic flows between subnets, to the internet, and to other networks.

Route Tables

Every subnet has an associated route table with rules determining where network traffic is directed. Route tables contain:

Destination - CIDR block or prefix list
Target - where matching traffic is sent (local, IGW, NAT Gateway, VPC endpoint, etc.)

Routes are evaluated most-specific-first - a /24 route takes precedence over a /16 route for overlapping destinations.

Local routes exist implicitly in every route table for the VPC CIDR block. Local routes cannot be modified or deleted - they ensure resources within the VPC can communicate.

Custom route tables are created for specific routing requirements (e.g., different routes for public vs. private subnets). Each subnet associates with exactly one route table.

Internet Gateway (IGW)

An Internet Gateway is a horizontally scaled, highly available VPC component enabling communication between the VPC and the internet. IGWs provide two functions:

Routing: Targets for route table entries directing internet-bound traffic
NAT: Performs network address translation for instances with public IP addresses

IGWs attach to VPCs (not subnets). A VPC can have at most one IGW. IGWs are fully managed by AWS - no scaling, patching, or availability concerns exist.

For a resource to communicate with the internet via IGW, three conditions must be met:

Subnet route table has 0.0.0.0/0 → IGW route
Resource has a public IP address or Elastic IP
Security groups and NACLs permit traffic

NAT Gateway

A NAT Gateway enables resources in private subnets to initiate outbound internet connections while preventing inbound connections from the internet. Common use cases include downloading software updates, accessing external APIs, and sending outbound notifications.

NAT Gateways are placed in public subnets and referenced in private subnet route tables:

The NAT Gateway translates the private source IP (10.0.11.5) to its public IP for outbound traffic. Return traffic flows back through the NAT Gateway, which translates the destination back to the private IP.

NAT Gateway Considerations

High availability: NAT Gateways are AZ-scoped. For fault-tolerant architectures, deploy one NAT Gateway per AZ and configure private subnet route tables in each AZ to use the local NAT Gateway. Cross-AZ NAT traffic incurs data transfer costs and creates a single point of failure.

Cost: NAT Gateways have hourly charges (~~$0.045/hour) plus data processing charges (~~$0.045/GB). For cost optimization, consider:

VPC endpoints for AWS service access (eliminate NAT Gateway traffic for S3, DynamoDB, etc.)
Centralizing NAT Gateways for non-production environments (accepting reduced availability)
EC2-based NAT instances for low-traffic environments (requires manual management)

Scaling: NAT Gateways scale automatically up to 100 Gbps. No configuration required.

Security: NAT Gateways do not support security groups (they honor subnet NACLs only). Apply security controls at source (private subnet instances' security groups).

Transit Gateway

Transit Gateway is a regional network hub connecting VPCs, on-premises networks, and remote offices. Transit Gateway replaces complex VPC peering meshes with a hub-and-spoke model.

Transit Gateway simplifies multi-VPC architectures by centralizing routing. Without Transit Gateway, connecting 5 VPCs requires 10 peering connections (N*(N-1)/2). With Transit Gateway, 5 VPCs require 5 attachments.

Use Transit Gateway when:

Connecting more than 3 VPCs
Integrating with on-premises networks via VPN or Direct Connect
Implementing centralized egress/ingress patterns
Requiring advanced routing controls between VPCs

For most single-VPC applications, Transit Gateway is unnecessary complexity. See multi-VPC architectures for organization-wide networking patterns.

Security Groups

Security Groups are stateful, instance-level firewalls controlling inbound and outbound traffic. Security groups are the primary network security mechanism in AWS.

Security Group Characteristics

Stateful: If an inbound rule allows traffic, the response is automatically allowed regardless of outbound rules. This differs from Network ACLs (stateless). If an application server security group allows inbound HTTP from a load balancer, the HTTP response automatically flows back to the load balancer without requiring an explicit outbound rule.

Allow-only: Security groups support only ALLOW rules. All traffic is implicitly denied unless explicitly allowed. To block specific traffic, omit the corresponding allow rule or use Network ACLs (which support DENY rules).

Instance-level: Security groups attach to elastic network interfaces (ENIs), not subnets. Each EC2 instance, RDS database, and Lambda function (in VPC) has associated security groups.

Evaluated together: Instances can have multiple security groups (up to 5). Rules from all groups are aggregated - traffic is permitted if any attached security group allows it.

Security Group Rules

Each security group rule specifies:

Type: Protocol (TCP, UDP, ICMP, or ALL)
Port range: Single port or range (e.g., 443 or 8080-8090)
Source (inbound) or Destination (outbound): CIDR block, security group ID, or prefix list

Example security group for application server:

Direction	Type	Port	Source	Purpose
Inbound	HTTP	80	sg-alb123	Traffic from ALB
Inbound	HTTPS	443	sg-alb123	Traffic from ALB
Inbound	Custom TCP	8080	sg-app456	Service-to-service traffic
Outbound	HTTPS	443	0.0.0.0/0	External API calls
Outbound	PostgreSQL	5432	sg-db789	Database access

Source/destination using security group IDs is a powerful pattern. Instead of hardcoding CIDR blocks, reference other security groups. Example: "Allow inbound on port 5432 from sg-app456" permits traffic from any resource attached to sg-app456, regardless of its IP address. This pattern adapts automatically when instances scale or IPs change.

Security Group Design Patterns

One security group per tier: Create separate security groups for each application tier (ALB, app servers, databases). This enables defense-in-depth - database security groups allow traffic only from application security groups, not from the entire VPC.

This security group chain ensures traffic flows only through permitted paths. The database security group accepts connections only from application servers, preventing lateral movement if an ALB or internet-facing component is compromised.

Deny-by-default: Security groups start with no inbound rules (all traffic denied) and an allow-all outbound rule. Remove the default outbound rule and add explicit outbound rules for each required destination. Explicit outbound rules provide visibility into application dependencies and prevent data exfiltration via compromised instances.

Use descriptive names and descriptions: Name security groups by purpose (e.g., prod-app-servers-sg, prod-db-sg). Add descriptions to rules explaining why they exist ("Allow health checks from ALB"). This documentation aids troubleshooting and security audits.

Audit regularly: Review security group rules quarterly to remove obsolete rules. Tools like AWS Config detect changes and flag overly permissive rules (e.g., 0.0.0.0/0 on SSH port 22).

For application-level security controls, see input validation and Spring Boot security.

Network ACLs (NACLs)

Network ACLs are stateless, subnet-level firewalls providing an additional security layer beyond security groups.

NACLs vs Security Groups

Feature	Security Groups	Network ACLs
Level	Instance (ENI)	Subnet
State	Stateful (return traffic automatic)	Stateless (explicit allow for both directions)
Rules	Allow only	Allow and Deny
Evaluation	All rules evaluated	Rules evaluated in order (lowest number first)
Default	Deny all inbound, allow all outbound	Allow all inbound and outbound

When to Use NACLs

Security groups provide sufficient protection for most use cases. Use NACLs for:

Blocking specific IP addresses: Security groups cannot deny traffic. Use NACL deny rules to block known malicious IPs at the subnet level before traffic reaches instances.

Subnet-level defense: Apply blanket allow/deny rules to all resources in a subnet without modifying individual security groups.

Regulatory compliance: Some compliance frameworks require network-level controls in addition to instance-level controls.

Additional defense layer: Defense-in-depth strategies use both security groups (instance-level) and NACLs (subnet-level) to create multiple security barriers.

NACL Rule Structure

NACL rules include:

Rule number: 1-32766 (lower numbers evaluated first)
Type: Protocol (TCP, UDP, ICMP, or ALL)
Port range: Destination port for inbound, source port for outbound
Source (inbound) or Destination (outbound): CIDR block
Action: ALLOW or DENY

NACLs evaluate rules in ascending rule number order. Once a rule matches, evaluation stops - subsequent rules are ignored. Always include a catch-all deny rule (rule 32767, added by default) as the final rule.

Example NACL for public subnet:

Rule #	Type	Port	Source/Dest	Action	Purpose
100	HTTP	80	0.0.0.0/0	ALLOW	Allow inbound HTTP
110	HTTPS	443	0.0.0.0/0	ALLOW	Allow inbound HTTPS
120	Custom TCP	32768-65535	0.0.0.0/0	ALLOW	Allow ephemeral return traffic
32767	ALL	ALL	0.0.0.0/0	DENY	Default deny (implicit)

Ephemeral ports (typically 32768-65535 for Linux, 49152-65535 for Windows) must be allowed for return traffic because NACLs are stateless. When an instance makes an outbound connection, the response uses an ephemeral port chosen by the operating system. Without allowing ephemeral ports inbound, return traffic is blocked.

NACL Best Practices

Use security groups as primary control: Security groups are easier to manage (stateful, no ephemeral port concerns) and more granular (instance-level). Reserve NACLs for specific use cases like IP blocking.

Leave default NACLs permissive: The default NACL allows all traffic. Create custom NACLs for subnets requiring restrictions rather than modifying the default.

Number rules with gaps: Use rule numbers 100, 110, 120, etc. (not 1, 2, 3). Gaps allow inserting new rules between existing rules without renumbering.

Mirror inbound/outbound rules: For stateless protocols like UDP, ensure inbound allow rules have corresponding outbound rules.

Document rules: Add descriptions explaining why each rule exists. NACLs are harder to audit than security groups due to rule number ordering.

Load Balancing

AWS provides three load balancer types: Application Load Balancer (ALB), Network Load Balancer (NLB), and Gateway Load Balancer (GWLB). ALB and NLB are relevant for most application architectures.

Application Load Balancer (ALB)

Application Load Balancer operates at OSI Layer 7 (HTTP/HTTPS) and routes traffic based on request content (path, headers, query parameters).

Key features:

Path-based routing: Route /api/* to API servers, /static/* to CDN, /admin/* to admin servers
Host-based routing: Route api.example.com to API servers, www.example.com to web servers
HTTP/2 and WebSocket support
SSL/TLS termination: Decrypt HTTPS at the load balancer, send HTTP to backend servers
Sticky sessions: Route requests from the same client to the same backend server
Authentication integration: OIDC and SAML authentication at load balancer level
AWS WAF integration: Web Application Firewall protection

ALB use cases:

Microservices routing (path-based routing to different services)
Multi-tenant applications (host-based routing)
Modern web applications (HTTP/2, WebSocket)

Example ALB routing configuration:

ALB distributes traffic across target groups based on listener rules. Each target group contains a set of backend servers (EC2 instances, ECS tasks, Lambda functions, or IP addresses). Health checks determine which targets receive traffic.

ALB Health Checks

ALBs continuously send health check requests to backend targets. Unhealthy targets are removed from rotation automatically.

Health check configuration:

Protocol: HTTP or HTTPS
Path: URL path to check (e.g., /health)
Interval: Seconds between checks (5-300, default 30)
Timeout: Seconds to wait for response (2-120, default 5)
Healthy threshold: Consecutive successes required to mark healthy (2-10, default 5)
Unhealthy threshold: Consecutive failures required to mark unhealthy (2-10, default 2)

Application health check endpoint example (Spring Boot):

@RestController
public class HealthCheckController {

    private final DatabaseHealthIndicator dbHealth;
    private final CacheHealthIndicator cacheHealth;

    @GetMapping("/health")
    public ResponseEntity<HealthStatus> health() {
        boolean healthy = dbHealth.isHealthy() && cacheHealth.isHealthy();

        if (healthy) {
            return ResponseEntity.ok(new HealthStatus("UP"));
        } else {
            return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
                .body(new HealthStatus("DOWN"));
        }
    }
}

The health endpoint returns 200 OK when healthy, 503 Service Unavailable when unhealthy. ALBs use HTTP status codes to determine health - 2xx/3xx indicate healthy, 4xx/5xx indicate unhealthy. See Spring Boot health checks for production health check patterns.

Network Load Balancer (NLB)

Network Load Balancer operates at OSI Layer 4 (TCP/UDP/TLS) and routes traffic based on IP protocol data, providing ultra-low latency and high throughput.

Key features:

Static IP per AZ: Each NLB has a static IP address (or Elastic IP) per availability zone
Extreme performance: Handles millions of requests per second with microsecond latency
Preserve source IP: Client IP addresses are preserved (not translated to load balancer IP)
TCP/UDP/TLS protocols: Not limited to HTTP/HTTPS
TLS termination: Decrypt TLS at load balancer
PrivateLink support: Expose services via VPC endpoints

NLB use cases:

TCP/UDP applications (databases, game servers, IoT protocols)
Extreme performance requirements (financial trading, real-time systems)
Static IP requirements (whitelisting, DNS A records)
Preserving client IP addresses (logging, security controls)

Unlike ALB, NLB preserves the client source IP address. Backend servers see the actual client IP rather than the load balancer IP. This enables IP-based access controls, geographic restrictions, and accurate client logging.

Choosing Between ALB and NLB

Factor	ALB	NLB
Protocol	HTTP/HTTPS only	TCP/UDP/TLS
Latency	Low (milliseconds)	Ultra-low (microseconds)
Request routing	Path, host, headers, query strings	IP/port only
Source IP	Not preserved (use X-Forwarded-For header)	Preserved
Static IP	Dynamic IPs (use DNS name)	Static IP per AZ
WebSocket	Yes	Yes (TCP)
Cost	Higher per LCU	Lower per NLCU

Use ALB for HTTP/HTTPS applications requiring content-based routing, host-based routing, or WAF integration. ALB is the default choice for web applications and microservices.

Use NLB for non-HTTP protocols (TCP/UDP), extreme performance requirements, static IP requirements, or when preserving client IP is essential.

For load balancing Kubernetes services on EKS, see AWS Load Balancer Controller and Kubernetes service types.

VPC Endpoints

VPC Endpoints enable private connectivity to AWS services without traversing the internet or NAT Gateway. VPC endpoints reduce costs (no NAT Gateway data processing charges), improve security (traffic never leaves AWS network), and improve performance (lower latency).

AWS provides two VPC endpoint types: Gateway Endpoints and Interface Endpoints.

Gateway Endpoints

Gateway Endpoints are route table targets for S3 and DynamoDB. Gateway endpoints are free and highly scalable.

Creating a gateway endpoint adds a route to subnet route tables directing S3/DynamoDB traffic to the endpoint:

Destination	Target
10.0.0.0/16	local
0.0.0.0/0	nat-xxxxx
pl-xxxxx (S3 prefix list)	vpce-xxxxx (Gateway Endpoint)

When an EC2 instance in a private subnet accesses S3, traffic routes via the gateway endpoint rather than the NAT Gateway. This eliminates NAT Gateway data charges for S3 traffic and improves performance.

Gateway endpoint policies control which S3 buckets or DynamoDB tables are accessible via the endpoint:

{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-application-bucket/*"
    }
  ]
}

This policy allows access only to my-application-bucket, blocking access to other S3 buckets via the endpoint. Combine endpoint policies with IAM policies for defense-in-depth - IAM policies control who can call S3 APIs, endpoint policies control which buckets are reachable via the endpoint.

See S3 access patterns for application integration with VPC endpoints.

Interface Endpoints (PrivateLink)

Interface Endpoints create elastic network interfaces (ENIs) in subnets, providing private IP addresses for AWS services. Interface endpoints support most AWS services (API Gateway, CloudWatch, Secrets Manager, ECR, ECS, Lambda, etc.) and third-party SaaS services.

Unlike gateway endpoints (route table targets), interface endpoints are DNS targets. When enabled, AWS service DNS names (e.g., secretsmanager.us-east-1.amazonaws.com) resolve to private IPs in your VPC rather than public IPs.

Interface endpoints charge hourly per AZ (~~$0.01/hour per AZ) plus data processing charges (~~$0.01/GB). Unlike gateway endpoints, interface endpoints are not free, but they support many more services.

Interface Endpoint Use Cases

Fully private workloads: Resources in data subnets (no internet routes) can access AWS services via interface endpoints without NAT Gateways.

Reduced NAT Gateway costs: Interface endpoints eliminate NAT Gateway data processing charges for AWS service traffic (CloudWatch Logs, ECR image pulls, Secrets Manager, etc.).

Security compliance: Regulatory requirements may prohibit AWS API traffic from traversing the internet. Interface endpoints ensure traffic remains on AWS private network.

Improved reliability: Eliminates dependency on NAT Gateway and internet gateway for AWS service access.

VPC Endpoint Best Practices

Create gateway endpoints for S3 and DynamoDB in all VPCs. Gateway endpoints are free and reduce NAT Gateway costs significantly for these high-traffic services.

Use interface endpoints for frequently accessed services: If your application heavily uses CloudWatch, Secrets Manager, or ECR, interface endpoints pay for themselves via reduced NAT Gateway charges. Calculate break-even based on data transfer volume.

Enable private DNS for interface endpoints: Private DNS automatically updates AWS service DNS names to resolve to endpoint IPs. Applications require no code changes - they continue using standard service hostnames.

Deploy interface endpoints in multiple AZs: For high availability, create interface endpoint ENIs in each availability zone where you have resources.

Combine with security groups: Interface endpoints have security groups controlling which resources can access them. Restrict access to specific application security groups rather than allowing 0.0.0.0/0.

For cost analysis, see AWS cost optimization. For Kubernetes-specific endpoint patterns, see EKS networking.

VPC Peering

VPC Peering creates private connectivity between two VPCs, enabling resources to communicate using private IP addresses as if they were in the same network.

VPC peering characteristics:

Non-transitive: If VPC A peers with VPC B, and VPC B peers with VPC C, VPC A and VPC C cannot communicate. Each VPC pair requires its own peering connection.
No single point of failure: Peering connections are not physical devices - they're logical connections in AWS infrastructure with built-in redundancy.
Cross-region support: VPCs in different regions can peer (inter-region peering).
Cross-account support: VPCs in different AWS accounts can peer.

After creating a peering connection, update route tables in both VPCs to direct traffic to the peering connection. Security groups and NACLs must allow cross-VPC traffic.

VPC Peering vs Transit Gateway

Factor	VPC Peering	Transit Gateway
Connections	N*(N-1)/2 (mesh)	N (hub-and-spoke)
Transitive routing	No	Yes
Complexity	High for many VPCs	Low
Cost	Data transfer only	Hourly + data transfer
Cross-region	Yes	Yes

Use VPC peering for simple scenarios (2-3 VPCs). Use Transit Gateway for complex multi-VPC architectures (4+ VPCs) or when transitive routing is required. See multi-account networking for organization-wide strategies.

DNS and Route 53 Integration

VPC DNS resolution is provided by Amazon-provided DNS server at the VPC CIDR base + 2 address (e.g., 10.0.0.2 for 10.0.0.0/16 VPC). This DNS server resolves:

Public DNS names to public IP addresses
Private DNS names (e.g., ip-10-0-1-23.ec2.internal) to private IP addresses
Route 53 private hosted zone records

Private Hosted Zones

Route 53 Private Hosted Zones provide custom DNS names for resources in your VPC. Create private hosted zones for:

Human-readable service names (api.internal, db.internal)
Service discovery for microservices
Environment-specific DNS (dev.internal, prod.internal)

Private hosted zones are VPC-specific - records resolve only within associated VPCs. This enables using the same DNS names across environments (dev, staging, production) without conflicts.

Example: Create private hosted zone internal associated with VPC. Add A record api.internal pointing to load balancer private IP. Applications in the VPC resolve api.internal to the load balancer, enabling DNS-based service discovery.

For advanced DNS patterns and multi-region routing, see Route 53 documentation. For service discovery in Kubernetes, see EKS DNS.

Network Monitoring and Troubleshooting

VPC Flow Logs

VPC Flow Logs capture IP traffic metadata (source, destination, ports, protocol, bytes, action) for network interfaces. Flow logs do not capture packet contents - only metadata.

Flow log destinations:

CloudWatch Logs (query with CloudWatch Logs Insights)
S3 (query with Athena)
Kinesis Data Firehose (stream to third-party tools)

Enable flow logs at VPC, subnet, or network interface level. VPC-level flow logs capture all traffic in the VPC.

Example flow log record:

2 123456789012 eni-abc123 203.0.113.10 10.0.1.25 443 38754 6 10 5000 1620000000 1620000060 ACCEPT OK

Fields: version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status

Use flow logs for:

Troubleshooting connectivity: Identify rejected connections (REJECT action)
Security analysis: Detect unusual traffic patterns
Compliance: Audit network access
Cost optimization: Identify high-traffic resources

Query flow logs with CloudWatch Logs Insights:

fields @timestamp, srcAddr, dstAddr, dstPort, protocol, bytes
| filter action = "REJECT"
| stats count() by dstPort, dstAddr
| sort count desc
| limit 20

This query identifies the top rejected connection destinations, useful for diagnosing security group misconfigurations.

VPC Reachability Analyzer

Reachability Analyzer is a network diagnostics tool that analyzes network paths between sources and destinations, identifying connectivity issues without generating traffic.

Reachability Analyzer checks:

Route table entries
Security group rules
Network ACL rules
Internet gateway and NAT gateway configurations

Use cases:

Debugging connectivity failures before deploying applications
Validating network changes (new security group rules, route table updates)
Security audits (verifying isolation between tiers)

Reachability Analyzer provides detailed hop-by-hop analysis showing exactly where and why traffic is blocked. This is more efficient than trial-and-error testing.

For observability and monitoring strategies, see observability overview and AWS observability.

Common Networking Anti-Patterns

Avoid these networking mistakes that create security, cost, or availability issues:

Overly complex CIDR schemes: Using random CIDR blocks or multiple secondary CIDR ranges complicates troubleshooting and IP planning. Standardize on simple, predictable CIDR schemes.

Public subnets for application servers: Placing application servers in public subnets increases attack surface. Use private subnets with load balancers in public subnets.

Single availability zone: Deploying all resources in one AZ eliminates fault tolerance. Always deploy across at least two AZs.

Overly permissive security groups: Security groups allowing 0.0.0.0/0 on all ports eliminate the security group's value. Apply least privilege - allow only required sources and ports.

Ignoring VPC endpoints: Routing AWS service traffic through NAT Gateways incurs unnecessary cost and latency. Use gateway endpoints (free) and interface endpoints (cost-effective for high-traffic services).

Cross-AZ NAT Gateway usage: Private subnets routing to NAT Gateways in different AZs incur cross-AZ data transfer charges and create availability dependencies. Deploy one NAT Gateway per AZ with local routing.

Not monitoring flow logs: Flow logs provide critical visibility into network behavior. Enable flow logs and review rejected connections regularly.

Hard-coded IP addresses: Applications using hard-coded IP addresses break when resources change. Use DNS names (Route 53, ELB DNS names) or service discovery patterns.

No network segmentation: Flat networks where all resources share security groups eliminate defense-in-depth. Segment networks by tier (public, private, data) with distinct security controls.

Large monolithic security groups: Security groups with dozens of rules are difficult to audit and maintain. Create focused security groups with clear purposes.

VPC Fundamentals​

What is a VPC?​

CIDR Planning​

CIDR Planning Best Practices​

Subnet Design Patterns​

Public Subnets​

Private Subnets​

Data/Isolated Subnets​

Multi-Tier Architecture Pattern​

Routing and Gateways​

Route Tables​

Internet Gateway (IGW)​

NAT Gateway​

NAT Gateway Considerations​

Transit Gateway​

Security Groups​

Security Group Characteristics​

Security Group Rules​

Security Group Design Patterns​

Network ACLs (NACLs)​

NACLs vs Security Groups​

When to Use NACLs​

NACL Rule Structure​

NACL Best Practices​

Load Balancing​

Application Load Balancer (ALB)​

ALB Health Checks​

Network Load Balancer (NLB)​

Choosing Between ALB and NLB​

VPC Endpoints​

Gateway Endpoints​

Interface Endpoints (PrivateLink)​

Interface Endpoint Use Cases​

VPC Endpoint Best Practices​

VPC Peering​

VPC Peering vs Transit Gateway​

DNS and Route 53 Integration​

Private Hosted Zones​

Network Monitoring and Troubleshooting​

VPC Flow Logs​

VPC Reachability Analyzer​

Common Networking Anti-Patterns​

Further Reading​

VPC Fundamentals

What is a VPC?

CIDR Planning

CIDR Planning Best Practices

Subnet Design Patterns

Public Subnets

Private Subnets

Data/Isolated Subnets

Multi-Tier Architecture Pattern

Routing and Gateways

Route Tables

Internet Gateway (IGW)

NAT Gateway

NAT Gateway Considerations

Transit Gateway

Security Groups

Security Group Characteristics

Security Group Rules

Security Group Design Patterns

Network ACLs (NACLs)

NACLs vs Security Groups

When to Use NACLs

NACL Rule Structure

NACL Best Practices

Load Balancing

Application Load Balancer (ALB)

ALB Health Checks

Network Load Balancer (NLB)

Choosing Between ALB and NLB

VPC Endpoints

Gateway Endpoints

Interface Endpoints (PrivateLink)

Interface Endpoint Use Cases

VPC Endpoint Best Practices

VPC Peering

VPC Peering vs Transit Gateway

DNS and Route 53 Integration

Private Hosted Zones

Network Monitoring and Troubleshooting

VPC Flow Logs

VPC Reachability Analyzer

Common Networking Anti-Patterns

Further Reading