Skip to main content

Kubernetes Best Practices

Overview

Kubernetes orchestrates containerized applications across clusters of machines, providing automated deployment, scaling, and management. A well-architected Kubernetes configuration ensures applications run reliably, scale efficiently, and recover automatically from failures. This guide covers deployment strategies, resource management, health probes, configuration handling, service mesh patterns, ingress routing, observability, and security hardening.

Kubernetes abstracts infrastructure complexity, allowing you to declare your desired state (3 replicas of payment service, each with 512MB memory) and letting Kubernetes maintain that state. If a pod crashes, Kubernetes restarts it. If a node fails, Kubernetes reschedules pods to healthy nodes. If traffic increases, Kubernetes scales pods horizontally.

The control plane (API server, scheduler, controller manager, etcd) manages the cluster state. The API server is the central point for all operations - kubectl commands, CI/CD deployments, and internal component communication all go through the API server. The scheduler assigns Pods to nodes based on resource availability and constraints. The controller manager runs control loops that watch cluster state and make changes to move current state toward desired state (ReplicaSet controller ensures correct number of Pod replicas exist). Etcd stores all cluster state persistently - if etcd data is lost, the cluster loses knowledge of all resources.

Worker nodes run the actual application Pods. The kubelet on each node watches for Pods assigned to its node, starts containers via the container runtime (containerd, CRI-O), and reports Pod status back to the API server. Kube-proxy maintains network rules enabling Pod-to-Pod communication and Service load balancing.

The platform's declarative approach uses YAML manifests to define resources (Deployments, Services, ConfigMaps, Ingresses). These manifests version control your infrastructure alongside application code, enabling GitOps workflows where infrastructure changes follow the same review and deployment process as code changes. For comprehensive Docker container preparation, see Docker Best Practices.


Core Principles

  1. Declarative Configuration: Define desired state, let Kubernetes converge
  2. Resource Limits: Set requests and limits for CPU/memory
  3. Health Probes: Use liveness, readiness, and startup probes
  4. Rolling Updates: Zero-downtime deployments with gradual rollout
  5. Configuration Separation: Externalize config via ConfigMaps and Secrets
  6. Observability: Expose metrics, logs, and traces
  7. Security: Pod Security Standards, Network Policies, RBAC
  8. High Availability: Multi-replica deployments across availability zones

Deployment Strategies

Kubernetes Deployment resources manage ReplicaSets, which in turn manage Pods. Deployment strategies control how updates roll out to running Pods - balancing between deployment speed, resource consumption, and risk mitigation.

Understanding Deployment Strategies

The deployment strategy determines how Kubernetes replaces old Pods with new ones during updates. The wrong strategy can cause downtime (Recreate), consume excessive resources (duplicate all Pods in blue-green), or require complex configuration (canary with manual verification).

Rolling updates provide the optimal balance for most use cases: gradual replacement of old Pods with new ones, maintaining application availability throughout the update. Kubernetes replaces a configurable number of Pods at a time (controlled by maxUnavailable and maxSurge), verifying each new Pod is ready before proceeding. This strategy works well for stateless services where running multiple versions simultaneously is acceptable.

Blue-green deployments maintain two complete environments - blue (current) and green (new). Traffic routes to blue while green is tested. Once validated, traffic switches to green instantly. This enables immediate rollback (switch back to blue) but requires double the resources.

Canary deployments gradually shift traffic from the old version to the new version, monitoring metrics at each step. Start with 10% traffic to the new version, monitor for issues, increase to 50%, then 100%. This detects problems early with minimal user impact but requires robust monitoring and metrics.

Recreate strategy terminates all old Pods before creating new ones, causing downtime. Only use this for stateful applications that cannot tolerate multiple versions running simultaneously (database schema migrations, singleton services).

The maxUnavailable parameter defines how many Pods can be unavailable during the update (0 means all Pods stay available). The maxSurge parameter defines how many extra Pods Kubernetes creates temporarily (1 means one extra Pod beyond desired replicas). Together, these control rollout speed and resource consumption.

For more on container health checks referenced in these strategies, see Docker Best Practices.

Rolling Update (Default)

apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: banking
labels:
app: payment-service
version: v2.1.0
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # 1 pod can be unavailable during update
maxSurge: 1 # 1 extra pod during rollout
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
version: v2.1.0
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3

How it works: Kubernetes gradually replaces Pods. With maxUnavailable: 1, at most 2 of 3 Pods are available during the update (one being replaced). With maxSurge: 1, Kubernetes creates a 4th Pod before terminating an old one, ensuring 3 Pods always handle traffic. Once the new Pod passes readiness checks, Kubernetes terminates an old Pod and repeats until all Pods are updated.

Blue-Green Deployment

# Blue deployment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service-blue
namespace: banking
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
version: blue
template:
metadata:
labels:
app: payment-service
version: blue
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.0.0
---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service-green
namespace: banking
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
version: green
template:
metadata:
labels:
app: payment-service
version: green
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
---
# Service initially points to blue
apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: banking
spec:
selector:
app: payment-service
version: blue # Switch to 'green' to cutover
ports:
- port: 80
targetPort: 8080
type: ClusterIP

How it works: Both blue (old) and green (new) deployments run simultaneously. The Service routes traffic to blue Pods initially. After verifying green Pods work correctly, update the Service selector from version: blue to version: green, instantly switching all traffic. If issues arise, revert by changing the selector back to blue. This strategy requires double the resources (6 total Pods instead of 3) but enables instant rollback.

Canary Deployment

# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service-stable
namespace: banking
spec:
replicas: 9
selector:
matchLabels:
app: payment-service
track: stable
template:
metadata:
labels:
app: payment-service
track: stable
version: v2.0.0
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.0.0
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service-canary
namespace: banking
spec:
replicas: 1
selector:
matchLabels:
app: payment-service
track: canary
template:
metadata:
labels:
app: payment-service
track: canary
version: v2.1.0
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
---
# Service routes to both
apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: banking
spec:
selector:
app: payment-service # Matches both stable and canary
ports:
- port: 80
targetPort: 8080

How it works: The Service selector matches both stable and canary Pods. With 9 stable Pods and 1 canary Pod, approximately 10% of traffic routes to the canary (Kubernetes load balances across all matching Pods). Monitor canary metrics (error rate, latency) and gradually increase canary replicas while decreasing stable replicas until fully rolled out. This strategy detects issues with minimal user impact.

For more sophisticated canary routing (percentage-based, header-based), use a service mesh like Istio, covered in Service Mesh Patterns.

Recreate Strategy

apiVersion: apps/v1
kind: Deployment
metadata:
name: database-migration-job
namespace: banking
spec:
replicas: 1
strategy:
type: Recreate # Terminate all old Pods before creating new ones
selector:
matchLabels:
app: migration-runner
template:
metadata:
labels:
app: migration-runner
spec:
containers:
- name: migrator
image: registry.example.com/migration-runner:2.1.0

How it works: Kubernetes terminates all old Pods simultaneously, waits for termination to complete, then creates new Pods. This causes downtime (no Pods run during the transition) but guarantees only one version runs at a time. Use this strategy for stateful applications that cannot run multiple versions concurrently (e.g., database migration tools).


Resource Management

Kubernetes schedules Pods onto nodes based on resource requests. Resource limits constrain how much CPU and memory a Pod can consume, preventing resource exhaustion. Proper resource configuration ensures efficient cluster utilization and application stability.

Understanding Requests vs Limits

Requests specify the minimum resources guaranteed to a Pod. Kubernetes uses requests for scheduling - a Pod with cpu: 500m only schedules to nodes with at least 500 millicores available. The Pod receives this amount of CPU during contention; if other Pods are idle, it can use more.

Limits specify the maximum resources a Pod can consume. If a Pod exceeds its memory limit, Kubernetes terminates it (OOMKilled). If it exceeds its CPU limit, Kubernetes throttles it (slows down execution) but doesn't terminate it.

The ratio between limits and requests determines resource overcommitment. If all Pods set requests: 500m and limits: 1000m, the cluster can run more Pods than physically possible if all Pods hit their limits simultaneously. This overcommitment improves utilization (most Pods don't use maximum resources constantly) but risks resource contention.

For Spring Boot applications, resource requirements vary based on traffic patterns and JVM heap size. A typical Spring Boot service needs 512MB-1GB memory for heap plus off-heap memory for threads, metaspace, and native libraries. See Spring Boot General for JVM tuning in containers.

Resource Requests and Limits

apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
replicas: 3
template:
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
resources:
requests:
cpu: 500m # 0.5 CPU cores
memory: 512Mi # 512 MiB
limits:
cpu: 1000m # 1 CPU core
memory: 1Gi # 1 GiB
env:
- name: JAVA_OPTS
value: "-Xmx768m -Xms512m -XX:MaxMetaspaceSize=256m"

Resource units: CPU is measured in millicores (1000m = 1 core). Memory uses binary (Mi, Gi) or decimal (M, G) units (1Mi = 1024 KiB, 1M = 1000 KB). Use binary units (Mi, Gi) for consistency.

Setting appropriate values: Start with conservative estimates based on profiling. Monitor actual usage in production and adjust. Under-provisioning causes OOMKilled Pods and slow performance. Over-provisioning wastes cluster resources and increases costs.

Quality of Service Classes

# Guaranteed QoS: requests = limits
apiVersion: v1
kind: Pod
metadata:
name: payment-service-guaranteed
spec:
containers:
- name: payment-service
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m # Same as request
memory: 512Mi # Same as request
---
# Burstable QoS: limits > requests
apiVersion: v1
kind: Pod
metadata:
name: payment-service-burstable
spec:
containers:
- name: payment-service
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m # Higher than request
memory: 1Gi # Higher than request
---
# BestEffort QoS: no requests or limits
apiVersion: v1
kind: Pod
metadata:
name: payment-service-besteffort
spec:
containers:
- name: payment-service
resources: {} # No requests or limits

QoS implications: When nodes run out of resources, Kubernetes evicts Pods to reclaim resources. Eviction priority: BestEffort Pods first, then Burstable Pods exceeding requests, then Guaranteed Pods. Critical services should use Guaranteed QoS to minimize eviction risk.

Horizontal Pod Autoscaling (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payment-service-hpa
namespace: banking
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Target 70% CPU
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Target 80% memory
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 50 # Max 50% of Pods removed per period
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60 # Wait 1 min before scaling up
policies:
- type: Percent
value: 100 # Max double Pods per period
periodSeconds: 60

How autoscaling works: The HPA controller queries metrics every 15 seconds. When average CPU utilization exceeds 70%, it calculates the number of replicas needed to bring utilization to 70% and scales up. The behavior section prevents flapping (rapid scaling up and down) by adding stabilization windows and limiting scaling velocity.

Custom metrics: Beyond CPU and memory, HPA supports custom metrics from application metrics servers (Prometheus). The http_requests_per_second metric scales based on application load rather than resource consumption, better reflecting actual demand.

Vertical Pod Autoscaling (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-service-vpa
namespace: banking
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
updatePolicy:
updateMode: Auto # Auto, Initial, Off, Recreate
resourcePolicy:
containerPolicies:
- containerName: payment-service
minAllowed:
cpu: 250m
memory: 256Mi
maxAllowed:
cpu: 2000m
memory: 2Gi
controlledResources:
- cpu
- memory

How VPA works: VPA monitors resource usage and recommends (or automatically applies) resource request adjustments. If a Pod consistently uses 800MB memory but requests only 512MB, VPA increases the request to match actual usage. This prevents OOMKilled Pods while avoiding over-provisioning.

VPA limitations: VPA cannot update running Pods' resources (except in Kubernetes 1.27+ with InPlacePodVerticalScaling feature gate). Instead, it recreates Pods with new resource values, causing brief unavailability. For most use cases, HPA (horizontal scaling) is preferred over VPA.

Note: Do not use HPA and VPA together on the same resource (CPU or memory) as they can conflict. Use HPA for CPU-based scaling and VPA for memory, or use only one autoscaler.

Cluster Autoscaling

Cluster autoscaling adds/removes nodes based on Pod scheduling needs. When Pods cannot schedule due to insufficient node resources, the cluster autoscaler provisions new nodes. When nodes are underutilized, it drains and removes them.

Configuration is cloud-provider-specific (AWS Cluster Autoscaler, GKE Node Auto-Provisioning, Azure Cluster Autoscaler). The autoscaler respects Pod Disruption Budgets (PDBs) during node drains and avoids removing nodes hosting Pods with local storage or strict affinity rules.

# Pod Disruption Budget to protect during node drains
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-service-pdb
namespace: banking
spec:
minAvailable: 2 # Always keep at least 2 Pods running
selector:
matchLabels:
app: payment-service

Why PDBs matter: Without PDBs, cluster autoscaler might drain nodes aggressively, terminating all replicas of a service simultaneously during scale-down. PDBs ensure at least minAvailable Pods stay running during voluntary disruptions (node drains, evictions), maintaining application availability.


Health Checks

Kubernetes uses health probes to determine Pod health and readiness to serve traffic. Properly configured probes enable automatic recovery from failures and prevent broken Pods from receiving traffic.

Understanding Probe Types

Liveness probes determine if a Pod is alive. If liveness checks fail repeatedly, Kubernetes restarts the Pod. Use liveness probes to detect deadlocks, infinite loops, or other unrecoverable states where the application is running but non-functional. Be conservative with liveness probes - overly aggressive checks cause restart loops.

Readiness probes determine if a Pod can serve traffic. If readiness checks fail, Kubernetes removes the Pod from Service endpoints (stops routing traffic to it) but doesn't restart the Pod. Use readiness probes to handle temporary issues like slow startup, database connection loss, or dependency unavailability.

Startup probes protect slow-starting containers. While the startup probe is running, Kubernetes doesn't execute liveness or readiness probes. This prevents premature restarts of applications with long initialization times (Spring Boot applications loading large contexts, JVM warming up).

For Spring Boot applications, the Actuator provides separate endpoints for liveness (/actuator/health/liveness) and readiness (/actuator/health/readiness). The readiness endpoint checks database connectivity, cache availability, and dependent services. See Spring Boot Observability for detailed Actuator configuration.

Liveness Probe

apiVersion: v1
kind: Pod
metadata:
name: payment-service
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
scheme: HTTP
initialDelaySeconds: 60 # Wait 60s after startup
periodSeconds: 30 # Check every 30s
timeoutSeconds: 5 # Timeout after 5s
failureThreshold: 3 # Restart after 3 failures
successThreshold: 1 # Mark healthy after 1 success

Configuration rationale: The initialDelaySeconds: 60 allows the Spring Boot application to complete startup before liveness checks begin. The periodSeconds: 30 balances responsiveness (detecting failures within 90 seconds: 30s × 3 failures) with overhead (health check load). The failureThreshold: 3 prevents flapping from transient failures.

Liveness probe pitfalls: Never use dependency checks (database connectivity) in liveness probes. If the database is temporarily unavailable, liveness probe failures restart all Pods, exacerbating the problem. Liveness probes should only detect unrecoverable application states, not transient dependency issues. Use readiness probes for dependency checks.

Readiness Probe

apiVersion: v1
kind: Pod
metadata:
name: payment-service
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 30 # Start checking after 30s
periodSeconds: 10 # Check every 10s
timeoutSeconds: 5 # Timeout after 5s
failureThreshold: 3 # Mark unready after 3 failures
successThreshold: 1 # Mark ready after 1 success

Configuration rationale: The periodSeconds: 10 makes readiness checks more frequent than liveness checks (10s vs 30s), enabling faster traffic routing decisions. When a Pod becomes unready (database connection lost), Kubernetes removes it from Service endpoints within 30 seconds (10s × 3 failures), preventing broken Pods from receiving traffic.

Readiness probe content: The /actuator/health/readiness endpoint should check all dependencies required to serve traffic: database connectivity, cache availability, external API reachability. If any dependency is unavailable, return 503 Service Unavailable, marking the Pod unready.

Startup Probe

apiVersion: v1
kind: Pod
metadata:
name: payment-service
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30 # Allow 300s (10s × 30) for startup
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3

How startup probes work: Kubernetes runs the startup probe first. Liveness and readiness probes don't start until the startup probe succeeds. With failureThreshold: 30 and periodSeconds: 10, the application has up to 300 seconds (5 minutes) to start. Once the startup probe succeeds, liveness and readiness probes take over.

When to use startup probes: Use startup probes for applications with variable or long startup times. Without startup probes, you must set high initialDelaySeconds on liveness probes to accommodate worst-case startup time, delaying failure detection for already-running Pods. Startup probes allow aggressive liveness probe settings while still supporting slow startup.

TCP and Exec Probes

# TCP probe (database, cache)
apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:16-alpine
livenessProbe:
tcpSocket:
port: 5432
periodSeconds: 30
failureThreshold: 3
---
# Exec probe (custom script)
apiVersion: v1
kind: Pod
metadata:
name: payment-service
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "wget -q -O- http://localhost:8080/actuator/health/liveness | grep UP"
periodSeconds: 30
failureThreshold: 3

TCP probes check if a port is accepting connections. Use TCP probes for services without HTTP endpoints (databases, caches, message queues). TCP probes only verify the port is open, not that the service is functional, so they're less thorough than HTTP probes.

Exec probes run commands inside the container and check the exit code (0 = success, non-zero = failure). Use exec probes for complex health checks not exposed via HTTP. However, exec probes have higher overhead (spawning processes) than HTTP or TCP probes.


Configuration Management

Kubernetes separates configuration from application code using ConfigMaps (non-sensitive data) and Secrets (sensitive data). This separation enables the same container image to run in multiple environments (dev, staging, production) with different configurations.

Understanding ConfigMaps and Secrets

ConfigMaps store configuration as key-value pairs or files. Applications consume ConfigMaps as environment variables, command-line arguments, or mounted files. Changing a ConfigMap requires restarting Pods to pick up new values (unless the application reloads configuration dynamically).

Secrets store sensitive data (passwords, API keys, certificates) base64-encoded. While base64 is not encryption, Secrets enable integration with encryption-at-rest (etcd encryption) and external secret management systems (Sealed Secrets, External Secrets Operator, Vault). Never store unencrypted secrets in Git repositories.

For comprehensive secrets handling strategies including rotation and cloud provider integration, see Secrets Management.

ConfigMap from Literal Values

apiVersion: v1
kind: ConfigMap
metadata:
name: payment-service-config
namespace: banking
data:
DATABASE_HOST: postgres.banking.svc.cluster.local
DATABASE_PORT: "5432"
DATABASE_NAME: payments
REDIS_HOST: redis.banking.svc.cluster.local
REDIS_PORT: "6379"
LOG_LEVEL: INFO
FEATURE_FLAG_NEW_PAYMENT_FLOW: "true"
---
# Consuming as environment variables
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
envFrom:
- configMapRef:
name: payment-service-config

How it works: The envFrom field injects all ConfigMap key-value pairs as environment variables. The payment service reads DATABASE_HOST, DATABASE_PORT, etc., from the environment. This approach works well for simple configuration but doesn't support dynamic reloading - changing the ConfigMap requires Pod restarts.

ConfigMap from File

apiVersion: v1
kind: ConfigMap
metadata:
name: payment-service-config
namespace: banking
data:
application.properties: |
server.port=8080
spring.datasource.url=jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT}/${DATABASE_NAME}
spring.datasource.username=${DATABASE_USER}
spring.datasource.password=${DATABASE_PASSWORD}
spring.data.redis.host=${REDIS_HOST}
spring.data.redis.port=${REDIS_PORT}
logging.level.root=${LOG_LEVEL}
payment.new-flow.enabled=${FEATURE_FLAG_NEW_PAYMENT_FLOW}
---
# Mounting as file
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
command: ["java"]
args:
- "-jar"
- "app.jar"
- "--spring.config.location=file:/app/config/application.properties"
volumes:
- name: config
configMap:
name: payment-service-config

How it works: The ConfigMap contains the entire application.properties file. The volume mount makes it available at /app/config/application.properties. Spring Boot loads the configuration file at startup. This approach supports complex configuration files (YAML, JSON, XML) and dynamic reloading if the application watches for file changes.

Secrets Management

apiVersion: v1
kind: Secret
metadata:
name: payment-service-secrets
namespace: banking
type: Opaque
stringData: # Use stringData for plain text (Kubernetes base64 encodes it)
DATABASE_PASSWORD: "p@ssw0rd123"
JWT_SECRET: "super-secret-jwt-key-change-in-production"
API_KEY: "sk_live_abc123xyz789"
---
# Consuming secrets
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: payment-service-secrets
key: DATABASE_PASSWORD
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: payment-service-secrets
key: JWT_SECRET
- name: API_KEY
valueFrom:
secretKeyRef:
name: payment-service-secrets
key: API_KEY

Security consideration: Never commit Secrets to Git with literal values. Use Sealed Secrets (encrypted Secrets safe for Git) or External Secrets Operator (fetches from Vault, AWS Secrets Manager) in production. The example above shows the Secret structure but should be created via CI/CD pipeline or external secret management.

External Secrets Operator

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: payment-service-secrets
namespace: banking
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: payment-service-secrets # Creates this Secret
creationPolicy: Owner
data:
- secretKey: DATABASE_PASSWORD
remoteRef:
key: banking/payment-service/database
property: password
- secretKey: JWT_SECRET
remoteRef:
key: banking/payment-service/jwt
property: secret
---
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: banking
spec:
provider:
vault:
server: "https://vault.example.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "payment-service"
serviceAccountRef:
name: payment-service

How it works: The External Secrets Operator syncs secrets from external systems (Vault, AWS Secrets Manager, Azure Key Vault) into Kubernetes Secrets. The refreshInterval: 1h keeps Secrets updated automatically. The application consumes the created Secret normally, unaware of the external source. This pattern centralizes secret management and enables secret rotation without modifying Kubernetes configurations.

Sealed Secrets

# Original secret (don't commit to Git)
apiVersion: v1
kind: Secret
metadata:
name: payment-service-secrets
namespace: banking
stringData:
DATABASE_PASSWORD: "p@ssw0rd123"
# Encrypt with kubeseal
kubeseal --format yaml < secret.yaml > sealed-secret.yaml
# Sealed secret (safe to commit to Git)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: payment-service-secrets
namespace: banking
spec:
encryptedData:
DATABASE_PASSWORD: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...
template:
metadata:
name: payment-service-secrets
namespace: banking
type: Opaque

How it works: The Sealed Secrets controller runs in the cluster with a private key. You encrypt Secrets using the public key (kubeseal), creating SealedSecrets safe for Git. When you apply the SealedSecret, the controller decrypts it and creates a regular Secret. Only the cluster's controller can decrypt, so encrypted SealedSecrets are safe in version control.


Service Mesh Patterns

Service meshes provide advanced traffic management, security, and observability for microservices without changing application code. The mesh injects a sidecar proxy (Envoy) into each Pod, intercepting all network traffic and applying policies.

Understanding Service Mesh Benefits

Service meshes address challenges in microservice architectures: mutual TLS (mTLS) between services requires certificate management; retries and circuit breakers require implementing resilience patterns; observability requires instrumenting every service; canary deployments require complex routing logic.

A service mesh handles these concerns at the infrastructure layer. The sidecar proxies establish mTLS automatically (no application changes), retry failed requests based on configured policies, emit metrics for every request (latency, error rate, traffic volume), and route traffic based on rules (percentage-based canaries, header-based routing).

The tradeoff is complexity - service meshes add operational overhead (managing the mesh control plane) and latency (every request passes through two proxies: client sidecar → server sidecar). For architectures with few microservices (< 5), a service mesh's complexity may outweigh its benefits. For larger microservice architectures (10+), the mesh simplifies cross-cutting concerns.

Istio Traffic Management

# Virtual Service for canary deployment
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: payment-service
namespace: banking
spec:
hosts:
- payment-service
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: payment-service
subset: v2-1-0
weight: 100
- route:
- destination:
host: payment-service
subset: v2-0-0
weight: 90
- destination:
host: payment-service
subset: v2-1-0
weight: 10
---
# Destination Rule defining subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: payment-service
namespace: banking
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
subsets:
- name: v2-0-0
labels:
version: v2.0.0
- name: v2-1-0
labels:
version: v2.1.0

How it works: The VirtualService defines routing rules. Requests with the x-canary: true header route 100% to v2.1.0 (for testing). Other requests split 90/10 between v2.0.0 and v2.1.0 (canary rollout). The DestinationRule configures connection pooling (max 100 concurrent connections) and outlier detection (eject unhealthy Pods after 5 consecutive errors for 30 seconds).

Progressive canary: Start with 10% traffic to v2.1.0, monitor error rates and latency for 30 minutes, increase to 50%, monitor again, then 100%. If issues arise at any stage, revert the VirtualService to route 100% to v2.0.0.

Mutual TLS (mTLS)

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: banking
spec:
mtls:
mode: STRICT # STRICT, PERMISSIVE, DISABLE
---
# Authorization Policy
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payment-service-authz
namespace: banking
spec:
selector:
matchLabels:
app: payment-service
action: ALLOW
rules:
- from:
- source:
namespaces: ["banking", "api-gateway"]
principals: ["cluster.local/ns/banking/sa/account-service"]
to:
- operation:
methods: ["POST"]
paths: ["/api/v1/payments"]

How it works: PeerAuthentication with mode: STRICT requires mTLS for all service-to-service communication in the banking namespace. Services without valid certificates cannot communicate. The AuthorizationPolicy restricts access: only the account-service (identified by its ServiceAccount principal) from the banking or api-gateway namespace can POST to /api/v1/payments. This provides zero-trust security - even inside the cluster, services must authenticate and are authorized based on identity.

Traffic Splitting and A/B Testing

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment-service
http:
- match:
- headers:
user-segment:
exact: "premium"
route:
- destination:
host: payment-service
subset: v2-1-0-premium
headers:
response:
add:
x-variant: "premium"
- match:
- headers:
user-segment:
exact: "standard"
route:
- destination:
host: payment-service
subset: v2-1-0-standard
headers:
response:
add:
x-variant: "standard"
- route:
- destination:
host: payment-service
subset: v2-0-0

How it works: Requests with user-segment: premium route to v2.1.0-premium (perhaps with different rate limits or features). Requests with user-segment: standard route to v2.1.0-standard. Other requests route to v2.0.0. The headers.response.add configuration adds an x-variant header to responses, enabling client-side analytics to correlate user experience with variant.

A/B testing: The application (or API gateway) sets the user-segment header based on user properties (subscription tier, geographic region, random assignment). The service mesh routes traffic accordingly. Application metrics (conversion rate, revenue per user) are tagged with variant, enabling statistical analysis of variant performance.


Ingress Controllers and Routing

Ingress controllers expose HTTP(S) routes from outside the cluster to Services inside the cluster. Ingress provides load balancing, TLS termination, and name-based virtual hosting.

Understanding Ingress

Kubernetes Services expose applications inside the cluster, but external clients cannot reach them (except via NodePort or LoadBalancer Services, which have limitations). Ingress provides a Layer 7 (HTTP) entry point, routing requests based on hostnames and paths to backend Services.

An Ingress controller (NGINX, Traefik, HAProxy, cloud-provider controllers) watches Ingress resources and configures itself to route traffic accordingly. The controller runs as a Deployment in the cluster, typically exposed via a LoadBalancer Service (cloud) or NodePort (on-premises).

Ingress enables multiple services to share a single load balancer IP (reducing cloud costs) and provides centralized TLS termination (certificates managed in one place rather than per-service).

NGINX Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: banking-api-ingress
namespace: banking
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/limit-rps: "10"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls-cert
rules:
- host: api.example.com
http:
paths:
- path: /payments(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: payment-service
port:
number: 80
- path: /accounts(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: account-service
port:
number: 80

How it works: Requests to https://api.example.com/payments/... route to the payment-service. The rewrite-target: /$2 annotation rewrites /payments/123 to /123 before forwarding to the backend (the payment-service sees /123, not /payments/123). The ssl-redirect: "true" annotation redirects HTTP to HTTPS. The rate limit annotations cap requests at 100 total and 10 per second per IP.

TLS termination: The tls section references api-tls-cert Secret containing the TLS certificate and private key. Cert-manager (indicated by the cert-manager.io/cluster-issuer annotation) automatically obtains and renews certificates from Let's Encrypt, storing them in the Secret. The Ingress controller terminates TLS and forwards unencrypted traffic to backend Services over the cluster network (secure because cluster networking is isolated).

Path-Based Routing

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: banking
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /v1/payments
pathType: Prefix
backend:
service:
name: payment-service-v1
port:
number: 80
- path: /v2/payments
pathType: Prefix
backend:
service:
name: payment-service-v2
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 80

How it works: Requests to /v1/payments/... route to payment-service-v1. Requests to /v2/payments/... route to payment-service-v2. All other requests (/) route to api-gateway. Path matching is prefix-based: /v1/payments matches /v1/payments/123/status. Order matters - more specific paths should appear before less specific paths.

API versioning: This pattern supports running multiple API versions simultaneously, enabling gradual client migration from v1 to v2. Clients specify the version in the path (/v1/payments vs /v2/payments), and Kubernetes routes to the appropriate backend.

Host-Based Routing

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: multi-tenant-ingress
namespace: banking
spec:
ingressClassName: nginx
tls:
- hosts:
- acme-bank.example.com
- globex-bank.example.com
secretName: wildcard-tls-cert
rules:
- host: acme-bank.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: banking-app
port:
number: 80
headers:
request:
set:
X-Tenant-ID: "acme"
- host: globex-bank.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: banking-app
port:
number: 80
headers:
request:
set:
X-Tenant-ID: "globex"

How it works: Requests to acme-bank.example.com route to banking-app with X-Tenant-ID: acme header. Requests to globex-bank.example.com route to the same banking-app but with X-Tenant-ID: globex header. The application uses the tenant ID to query the correct tenant's data. This multi-tenancy pattern uses a single application deployment serving multiple tenants, identified by hostname.


Monitoring and Logging

Kubernetes generates vast amounts of operational data: container logs, resource metrics, cluster events, API server audit logs. Effective monitoring and logging enable troubleshooting, performance optimization, and capacity planning.

Understanding Kubernetes Observability

Observability in Kubernetes operates at multiple levels: container logs (stdout/stderr from applications), metrics (CPU, memory, request rate, latency), events (Pod scheduled, container restarted, volume mounted), and traces (distributed request flow through microservices).

The metrics-server provides basic resource metrics for HPA and kubectl top. Prometheus scrapes application and system metrics for detailed monitoring and alerting. The ELK stack (Elasticsearch, Logstash, Kibana) or Loki aggregates logs from all containers. Jaeger or Zipkin collects distributed traces showing request flow.

For comprehensive observability strategies including structured logging and distributed tracing, see Observability Overview.

Prometheus Metrics

apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: banking
labels:
app: payment-service
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
spec:
selector:
app: payment-service
ports:
- port: 80
targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: payment-service
namespace: banking
labels:
app: payment-service
spec:
selector:
matchLabels:
app: payment-service
endpoints:
- port: http
path: /actuator/prometheus
interval: 30s
scrapeTimeout: 10s

How it works: The prometheus.io/* annotations tell Prometheus to scrape metrics from the payment-service at /actuator/prometheus on port 8080 every 30 seconds. The ServiceMonitor (Prometheus Operator CRD) creates the Prometheus scrape configuration automatically. Spring Boot Actuator exposes JVM metrics (heap usage, GC), application metrics (request count, error rate), and custom business metrics (payments processed).

Key metrics to monitor: Request rate (http_server_requests_seconds_count), error rate (http_server_requests_seconds_count{status="5xx"}), request duration (http_server_requests_seconds_sum / http_server_requests_seconds_count), JVM heap usage (jvm_memory_used_bytes{area="heap"}), GC pauses (jvm_gc_pause_seconds). These metrics enable SLO tracking and alert configuration.

Logging with Fluentd and Loki

apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: logging
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>

<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
</filter>

<match kubernetes.var.log.containers.**banking**.log>
@type loki
url http://loki:3100
extra_labels {"env":"production"}
<label>
namespace $.kubernetes.namespace_name
pod $.kubernetes.pod_name
container $.kubernetes.container_name
</label>
</match>
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
serviceAccountName: fluentd
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-loki
volumeMounts:
- name: varlog
mountPath: /var/log
- name: config
mountPath: /fluentd/etc
volumes:
- name: varlog
hostPath:
path: /var/log
- name: config
configMap:
name: fluentd-config

How it works: Fluentd runs as a DaemonSet (one Pod per node), tailing container logs from /var/log/containers/*.log. The kubernetes_metadata filter enriches logs with Kubernetes metadata (namespace, pod name, labels). Logs from the banking namespace are forwarded to Loki with labels (namespace, pod, container) for querying. Loki stores logs efficiently and provides a query language (LogQL) similar to Prometheus.

Structured logging: Applications should output JSON-formatted logs to stdout. JSON logs are machine-parseable, enabling filtering, aggregation, and correlation. For example: {"timestamp":"2025-01-08T10:30:00Z","level":"ERROR","message":"Payment failed","paymentId":"123","error":"Insufficient funds"}. Loki can query {namespace="banking"} | json | paymentId="123" to find all logs for payment 123.

Distributed Tracing with Jaeger

apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
spec:
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://jaeger-collector:4317"
- name: OTEL_SERVICE_NAME
value: "payment-service"
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "0.1" # Sample 10% of traces

How it works: The application uses OpenTelemetry to instrument HTTP requests, database queries, and external API calls, creating spans (units of work). Each request gets a trace ID, propagated across service calls via HTTP headers. Spans are exported to the Jaeger collector at jaeger-collector:4317. The sampler configuration (10%) reduces overhead - only 10% of traces are collected, which is sufficient for identifying performance bottlenecks.

Analyzing traces: Jaeger UI shows the complete request flow: API gateway → payment-service → database query → account-service → payment-service → response. Each span includes duration, enabling identification of slow operations (e.g., a database query taking 500ms). Traces correlate logs via trace IDs - logs can include traceId fields, allowing you to view all logs for a specific request.


Security

Kubernetes security operates at multiple layers: cluster access control (RBAC), network isolation (Network Policies), Pod permissions (Pod Security Standards), secret management, and container image security.

Understanding Kubernetes Security Model

By default, Kubernetes is permissive: Pods can communicate with all other Pods, containers run as root, and ServiceAccounts have minimal permissions. Production environments must harden these defaults.

Principle of least privilege guides Kubernetes security: Pods should run with minimal required permissions, ServiceAccounts should have minimal RBAC permissions, and network policies should deny traffic by default and explicitly allow necessary communication.

For comprehensive security coverage including authentication, authorization, and encryption, see Security Overview.

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
name: banking
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
---
# Pod adhering to restricted policy
apiVersion: v1
kind: Pod
metadata:
name: payment-service
namespace: banking
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: payment-service
image: registry.example.com/payment-service:2.1.0
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}

How it works: The namespace labels enforce the restricted Pod Security Standard, the most restrictive level. Pods in this namespace must run as non-root (runAsNonRoot: true), cannot escalate privileges (allowPrivilegeEscalation: false), must drop all capabilities (capabilities.drop: ALL), and use the default seccomp profile. The readOnlyRootFilesystem: true prevents the container from modifying its filesystem (except mounted volumes like the emptyDir for /tmp).

Security benefits: If an attacker exploits the application and gains code execution, they run as a non-root user (UID 1001) with no capabilities and cannot modify the filesystem. This dramatically limits the attack surface. For writable storage requirements, mount emptyDir or PersistentVolumeClaims at specific paths.

Network Policies

# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: banking
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# Allow payment-service to receive from API gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-service-ingress
namespace: banking
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: api-gateway
podSelector:
matchLabels:
app: api-gateway
- podSelector:
matchLabels:
app: account-service
ports:
- protocol: TCP
port: 8080
---
# Allow payment-service to reach database and external APIs
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-service-egress
namespace: banking
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to:
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
- to:
- namespaceSelector:
matchLabels:
name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 443

How it works: The default-deny-all policy blocks all ingress and egress traffic in the banking namespace (zero-trust baseline). Subsequent policies explicitly allow required traffic: payment-service accepts connections from api-gateway and account-service on port 8080; payment-service connects to postgres (5432), redis (6379), DNS (53), and external HTTPS (443) but not to private IP ranges (preventing access to metadata services or other internal services).

Network segmentation: Network policies create microsegmentation - each service only communicates with necessary dependencies. If an attacker compromises the payment-service, they cannot reach the database admin interface or other unrelated services. This "defense in depth" limits lateral movement.

RBAC (Role-Based Access Control)

# ServiceAccount for payment-service
apiVersion: v1
kind: ServiceAccount
metadata:
name: payment-service
namespace: banking
---
# Role granting minimal permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: payment-service-role
namespace: banking
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
resourceNames: ["payment-service-config"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
resourceNames: ["payment-service-secrets"]
---
# RoleBinding attaching Role to ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: payment-service-rolebinding
namespace: banking
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: payment-service-role
subjects:
- kind: ServiceAccount
name: payment-service
namespace: banking
---
# Deployment using ServiceAccount
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: banking
spec:
template:
spec:
serviceAccountName: payment-service
automountServiceAccountToken: true

How it works: The payment-service Pod uses the payment-service ServiceAccount. This ServiceAccount has permission to read only the payment-service-config ConfigMap and payment-service-secrets Secret, nothing else. If the application is compromised, the attacker cannot list all Secrets, modify Deployments, or access other resources - the ServiceAccount's limited permissions constrain them.

Default ServiceAccount: By default, Pods use the default ServiceAccount in their namespace. The default ServiceAccount should have no permissions (automountServiceAccountToken: false to prevent mounting entirely). Create dedicated ServiceAccounts for each application with minimal required permissions.

Admission Controllers

# OPA Gatekeeper constraint template
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
msg := sprintf("Missing required labels: %v", [missing])
}
---
# Constraint requiring labels
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-app-and-version-labels
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment"]
namespaces: ["banking"]
parameters:
labels: ["app", "version"]

How it works: OPA Gatekeeper is a validating admission controller that runs policies (written in Rego) on all API requests. This policy requires all Deployments in the banking namespace to have app and version labels. Attempts to create Deployments without these labels are rejected. Admission controllers enforce organizational policies (required labels, image registries, resource limits, security contexts) automatically.

Common policies: Enforce container image sources (only from approved registries), require resource limits on all containers, disallow privileged containers, require specific labels for cost attribution, enforce naming conventions.


Further Reading

Internal Documentation

External Resources


Summary

Key Takeaways

  1. Deployment strategies - Rolling updates for zero-downtime, blue-green for instant rollback, canary for gradual rollout
  2. Resource management - Set requests for scheduling, limits for safety; use HPA for scaling
  3. Health probes - Liveness detects crashes, readiness controls traffic, startup protects slow initialization
  4. Configuration separation - ConfigMaps for config, Secrets for sensitive data, external secret management for production
  5. Service mesh - Istio/Linkerd for mTLS, traffic splitting, observability without code changes
  6. Ingress - Centralized HTTP routing, TLS termination, path/host-based routing
  7. Observability - Prometheus for metrics, Loki/ELK for logs, Jaeger for traces
  8. Security - Pod Security Standards, Network Policies, RBAC, admission controllers
  9. High availability - Multi-replica deployments, Pod Disruption Budgets, anti-affinity rules
  10. GitOps - Version control Kubernetes manifests, automate deployments via CI/CD

Next Steps: Review Docker Best Practices for container image optimization, CI/CD Pipelines for Kubernetes deployment automation, and Microservices Architecture for service design patterns.