Skip to main content

API Patterns

Resilience and Reliability

Error handling, circuit breakers, and retry logic are essential for building reliable APIs. Design for failure from the start - network issues, service outages, and unexpected errors will happen. Good patterns ensure your API degrades gracefully and recovers automatically.

Overview

This guide covers general API design patterns that apply across REST, GraphQL, and other API styles:

  • Error handling and response formats
  • Circuit breaker pattern for resilience
  • Retry strategies with exponential backoff
  • Async processing patterns
  • Performance optimization
  • Security patterns

Note: For REST-specific patterns (pagination, filtering, idempotency), see REST Patterns. For versioning strategies, see REST Versioning.


Error Handling

Consistent error responses enable clients to handle errors programmatically. Well-designed error messages provide enough information for debugging without exposing internal implementation details or security vulnerabilities.

Error Response Design Principles

  1. Consistency: All errors follow the same structure across all endpoints
  2. Actionability: Error messages explain what went wrong and how to fix it
  3. Security: Don't expose stack traces, internal paths, database details, or sensitive data
  4. Traceability: Include request IDs or correlation IDs for debugging
  5. Machine-readable: Use error codes or types for programmatic handling

Standard Error Response Format

A standard error response provides clients with context to understand and potentially resolve the issue. This structure follows RFC 7807 Problem Details, an IETF standard for HTTP API error responses.

public record ErrorResponse(
Instant timestamp, // When the error occurred
int status, // HTTP status code
String error, // Short error name (e.g., "Not Found")
String message, // Human-readable description
String path, // Request path that caused error
String traceId, // Correlation ID for distributed tracing
Map<String, String> validationErrors // Field-level validation errors (optional)
) {
public static ErrorResponse from(HttpStatus status, String message, String path, String traceId) {
return new ErrorResponse(
Instant.now(),
status.value(),
status.getReasonPhrase(),
message,
path,
traceId,
null
);
}
}

Example response:

{
"timestamp": "2025-01-15T10:30:00Z",
"status": 404,
"error": "Not Found",
"message": "Payment with ID PAY-123 not found",
"path": "/api/v1/payments/PAY-123",
"traceId": "abc123def456",
"validationErrors": null
}

Global Exception Handler

Spring's @RestControllerAdvice provides centralized exception handling across all controllers. This eliminates duplicate error handling code and ensures consistent error responses.

import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import org.springframework.web.context.request.WebRequest;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;

@RestControllerAdvice
public class GlobalExceptionHandler {

@Autowired
private TraceIdProvider traceIdProvider;

// Domain-specific exceptions (404 Not Found)
@ExceptionHandler(PaymentNotFoundException.class)
public ResponseEntity<ErrorResponse> handlePaymentNotFound(
PaymentNotFoundException ex,
WebRequest request) {

ErrorResponse error = ErrorResponse.from(
HttpStatus.NOT_FOUND,
ex.getMessage(),
request.getDescription(false),
traceIdProvider.getCurrentTraceId()
);

return ResponseEntity.status(HttpStatus.NOT_FOUND).body(error);
}

// Validation errors (400 Bad Request)
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<ErrorResponse> handleValidationErrors(
MethodArgumentNotValidException ex,
WebRequest request) {

// Extract field-level validation errors
Map<String, String> validationErrors = ex.getBindingResult()
.getFieldErrors()
.stream()
.collect(Collectors.toMap(
FieldError::getField,
error -> error.getDefaultMessage() != null ? error.getDefaultMessage() : ""
));

ErrorResponse error = new ErrorResponse(
Instant.now(),
HttpStatus.BAD_REQUEST.value(),
"Validation Failed",
"Invalid request parameters",
request.getDescription(false),
traceIdProvider.getCurrentTraceId(),
validationErrors
);

return ResponseEntity.badRequest().body(error);
}

// Business logic failures (422 Unprocessable Entity)
@ExceptionHandler(InsufficientBalanceException.class)
public ResponseEntity<ErrorResponse> handleInsufficientBalance(
InsufficientBalanceException ex,
WebRequest request) {

ErrorResponse error = ErrorResponse.from(
HttpStatus.UNPROCESSABLE_ENTITY,
"Insufficient balance to complete payment",
request.getDescription(false),
traceIdProvider.getCurrentTraceId()
);

return ResponseEntity.status(HttpStatus.UNPROCESSABLE_ENTITY).body(error);
}

// Catch-all for unexpected errors (500 Internal Server Error)
@ExceptionHandler(Exception.class)
public ResponseEntity<ErrorResponse> handleGenericException(
Exception ex,
WebRequest request) {

// Log full error details for debugging (includes stack trace)
log.error("Unexpected error occurred", ex);

// Return generic message to client (don't expose internal details)
ErrorResponse error = ErrorResponse.from(
HttpStatus.INTERNAL_SERVER_ERROR,
"An unexpected error occurred. Please contact support with trace ID: " +
traceIdProvider.getCurrentTraceId(),
request.getDescription(false),
traceIdProvider.getCurrentTraceId()
);

return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(error);
}
}

Validation Error Response

For validation failures, include field-level details:

{
"timestamp": "2025-01-15T10:30:00Z",
"status": 400,
"error": "Bad Request",
"message": "Validation failed",
"path": "/api/v1/payments",
"traceId": "abc123def456",
"validationErrors": {
"amount": "must be positive",
"currency": "must not be blank",
"fromAccount": "invalid account number format"
}
}

Security Considerations in Error Handling

Never expose:

  • Stack traces in responses (log them server-side only)
  • Internal file paths or class names
  • Database query details or SQL errors
  • Configuration values or environment variables
  • Existence of resources for authorization failures (use 403, not 404)

Do expose:

  • Validation errors with field names and constraints
  • Correlation/trace IDs for tracking requests across services
  • General error categories (Not Found, Validation Failed, etc.)
  • Actionable guidance on how to fix the error

Example of secure vs insecure error handling:

// BAD: Exposes internal details
{
"error": "NullPointerException at PaymentService.java:142",
"stackTrace": "java.lang.NullPointerException\n\tat com.example.PaymentService.process...",
"query": "SELECT * FROM payments WHERE id = 'PAY-123'"
}

// GOOD: Generic message, log details internally
{
"status": 500,
"error": "Internal Server Error",
"message": "An unexpected error occurred. Please contact support with trace ID: abc123",
"traceId": "abc123",
"timestamp": "2025-01-15T10:30:00Z"
}

For more on secure error handling, see Input Validation and Security Overview.


Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures when a downstream service is unavailable. Instead of repeatedly calling a failing service, the circuit breaker "opens" after a threshold of failures, failing fast and giving the downstream service time to recover.

How Circuit Breakers Work

Circuit States:

  1. Closed (Normal): Requests pass through. If failures exceed threshold, transition to Open.
  2. Open (Failing): Requests fail immediately without calling service. After timeout, transition to Half-Open.
  3. Half-Open (Testing): Allow limited requests to test if service recovered. If successful, transition to Closed. If failures continue, return to Open.

Implementation with Resilience4j

Resilience4j is the recommended library for circuit breakers in Spring Boot (successor to Hystrix).

import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import org.springframework.stereotype.Service;

@Service
public class PaymentService {

@Autowired
private PaymentClient paymentClient;

@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
public Payment processPayment(PaymentRequest request) {
// Call external payment service
return paymentClient.createPayment(request);
}

// Fallback method called when circuit is open
private Payment paymentFallback(PaymentRequest request, Exception ex) {
log.warn("Payment service unavailable, using fallback", ex);

// Return cached response, queue for later processing, or return error
return Payment.builder()
.id("PENDING-" + UUID.randomUUID())
.status(PaymentStatus.QUEUED)
.message("Payment queued for processing when service recovers")
.build();
}
}

Configuration (application.yml):

resilience4j:
circuitbreaker:
instances:
paymentService:
registerHealthIndicator: true
slidingWindowSize: 10 # Track last 10 requests
minimumNumberOfCalls: 5 # Min calls before calculating failure rate
failureRateThreshold: 50 # Open circuit if 50% fail
waitDurationInOpenState: 10s # Wait 10s before trying half-open
permittedNumberOfCallsInHalfOpenState: 3 # Allow 3 test calls in half-open
automaticTransitionFromOpenToHalfOpenEnabled: true

Why circuit breakers matter: Without circuit breakers, a slow or failing downstream service can cause thread pool exhaustion, cascading failures, and system-wide outages. Circuit breakers prevent this by failing fast and allowing time for recovery.

See Spring Boot Resilience for comprehensive circuit breaker patterns.


Retry Pattern

Transient failures (network timeouts, temporary service unavailability) are common in distributed systems. Retry logic automatically retries failed requests, improving reliability.

Exponential Backoff

Always use exponential backoff to avoid overwhelming recovering services:

Why exponential backoff: Linear backoff (retry every 1 second) can overwhelm a recovering service with retry storms. Exponential backoff gives services time to recover.

Implementation with Resilience4j Retry

import io.github.resilience4j.retry.annotation.Retry;

@Service
public class PaymentService {

@Retry(name = "paymentService", fallbackMethod = "paymentRetryFallback")
public Payment processPayment(PaymentRequest request) {
return paymentClient.createPayment(request);
}

private Payment paymentRetryFallback(PaymentRequest request, Exception ex) {
log.error("Payment failed after retries", ex);
throw new PaymentProcessingException("Payment service unavailable after retries", ex);
}
}

Configuration (application.yml):

resilience4j:
retry:
instances:
paymentService:
maxAttempts: 3
waitDuration: 1s
enableExponentialBackoff: true
exponentialBackoffMultiplier: 2 # 1s, 2s, 4s
retryExceptions:
- java.net.SocketTimeoutException
- org.springframework.web.client.ResourceAccessException
ignoreExceptions:
- com.example.PaymentValidationException # Don't retry validation errors

When to Retry vs When Not to Retry

Retry for (transient failures):

  • Network timeouts
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • Connection refused
  • Temporary database connection issues

Do NOT retry (permanent failures):

  • 400 Bad Request (client error, won't succeed on retry)
  • 401 Unauthorized (authentication failure)
  • 403 Forbidden (authorization failure)
  • 404 Not Found (resource doesn't exist)
  • 422 Unprocessable Entity (business logic failure)
// Only retry specific exception types
resilience4j:
retry:
instances:
paymentService:
retryExceptions:
- java.net.SocketTimeoutException
- org.springframework.web.client.HttpServerErrorException.ServiceUnavailable
ignoreExceptions:
- org.springframework.web.client.HttpClientErrorException # 4xx errors

Combining Circuit Breaker + Retry

For maximum resilience, combine circuit breaker and retry patterns:

@Service
public class PaymentService {

// Order matters: Retry wraps CircuitBreaker
@Retry(name = "paymentService")
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
public Payment processPayment(PaymentRequest request) {
return paymentClient.createPayment(request);
}

private Payment paymentFallback(PaymentRequest request, Exception ex) {
// Called when circuit is open OR retries exhausted
log.error("Payment failed", ex);
throw new PaymentProcessingException("Payment service unavailable", ex);
}
}

Execution flow:

  1. Request sent
  2. If fails, Retry attempts up to maxAttempts with exponential backoff
  3. If retries exhausted, Circuit Breaker tracks failure
  4. If circuit breaker threshold exceeded, circuit opens and requests fail fast
  5. If all fails, fallback is called

Timeout Configuration

Always configure timeouts to prevent indefinite blocking:

# HTTP client timeouts
spring:
cloud:
openfeign:
client:
config:
default:
connectTimeout: 5000 # 5 seconds to establish connection
readTimeout: 10000 # 10 seconds to read response
// RestTemplate timeout configuration
@Bean
public RestTemplate restTemplate() {
HttpComponentsClientHttpRequestFactory factory = new HttpComponentsClientHttpRequestFactory();
factory.setConnectTimeout(5000); // 5 second connection timeout
factory.setReadTimeout(10000); // 10 second read timeout
return new RestTemplate(factory);
}

Timeout recommendations:

  • Connection timeout: 3-5 seconds (time to establish TCP connection)
  • Read timeout: 10-30 seconds (time to receive full response)
  • Total timeout: Connection + Read + Retry time

Async Processing Patterns

For long-running operations (>30 seconds), use asynchronous processing patterns. See REST Patterns - Async Operations for REST implementation details.

Pattern: Immediate Acceptance with Status Polling

Return 202 Accepted immediately with job tracking URL:

@PostMapping("/reports/generate")
public ResponseEntity<JobStatus> generateReport(@Valid @RequestBody ReportRequest request) {
String jobId = reportService.scheduleReport(request);

URI statusLocation = ServletUriComponentsBuilder
.fromCurrentContextPath()
.path("/api/v1/reports/jobs/{jobId}")
.buildAndExpand(jobId)
.toUri();

JobStatus status = new JobStatus(
jobId,
JobState.PROCESSING,
Instant.now(),
null,
null
);

return ResponseEntity.accepted()
.location(statusLocation)
.body(status);
}

@GetMapping("/reports/jobs/{jobId}")
public ResponseEntity<JobStatus> getJobStatus(@PathVariable String jobId) {
JobStatus status = reportService.getJobStatus(jobId);
return ResponseEntity.ok(status);
}

Why async processing: Long operations tie up HTTP connections and can timeout. Async processing returns immediately, allowing clients to poll or receive webhooks when complete.


Performance Patterns

Caching

Leverage HTTP caching to reduce load and improve performance. For comprehensive caching strategies, see Caching.

@GetMapping("/{id}")
public ResponseEntity<AccountDto> getAccount(@PathVariable String id) {
Account account = accountService.getAccount(id);
return ResponseEntity.ok()
.cacheControl(CacheControl.maxAge(10, TimeUnit.MINUTES).cachePublic())
.eTag(generateETag(account))
.body(AccountDto.from(account));
}

HTTP Cache-Control headers:

  • max-age=600: Cache for 10 minutes
  • no-cache: Revalidate with server before using cached response
  • no-store: Don't cache at all (for sensitive data)
  • private: Only client can cache (not intermediaries)
  • public: Any cache can store

Response Compression

Enable response compression for large payloads:

# application.yml
server:
compression:
enabled: true
mime-types:
- application/json
- application/xml
min-response-size: 1024 # Compress responses > 1KB

Bulk Operations

Support batch operations to reduce round trips:

@PostMapping("/batch")
public ResponseEntity<BatchResponse> createPaymentsBatch(
@Valid @RequestBody List<PaymentRequest> requests) {

List<Payment> payments = requests.stream()
.map(paymentService::createPayment)
.toList();

return ResponseEntity.ok(new BatchResponse(payments));
}

Further Reading

Internal Documentation

External Resources


Summary

Key Takeaways:

  1. Error handling - Consistent RFC 7807 format, never expose internal details, include trace IDs
  2. Circuit breakers - Prevent cascading failures, fail fast when services are down
  3. Retries - Use exponential backoff, only retry transient failures (5xx, timeouts)
  4. Combine patterns - Retry + Circuit Breaker for maximum resilience
  5. Timeouts - Always configure connection and read timeouts (prevent indefinite blocking)
  6. Async processing - Use 202 Accepted for operations >30 seconds
  7. Caching - Leverage HTTP cache headers for static/semi-static data
  8. Compression - Enable for responses >1KB

Next Steps: Review REST Patterns for REST-specific patterns, then Spring Boot Resilience for implementation details.