Caching Strategies

Caching is a fundamental technique for improving application performance by storing frequently accessed data in faster storage layers. Effective caching reduces database load, decreases response times, and improves overall system scalability. However, caching introduces complexity around data consistency, invalidation, and memory management that must be carefully managed.

This guide covers multi-level caching strategies, invalidation patterns, distributed caching systems, and monitoring approaches to help you implement effective caching in your applications.

Core Caching Principles

Understanding fundamental caching concepts enables effective strategy implementation:

Cache Hit vs Cache Miss: A cache hit occurs when requested data exists in the cache and can be returned immediately. A cache miss occurs when data must be retrieved from the slower backing store (database, API, etc.). The cache hit ratio (hits / total requests) is a key performance metric - higher ratios indicate more effective caching.

Cache Consistency: Caching creates multiple copies of data across different layers. Consistency determines how quickly changes propagate. Strong consistency ensures all reads return the most recent write, while eventual consistency allows temporary staleness in exchange for better performance. Most application caching uses eventual consistency with controlled staleness bounds.

Time-To-Live (TTL): TTL defines how long data remains valid in the cache before expiring. Shorter TTLs reduce staleness but increase cache misses. Longer TTLs improve hit rates but may serve outdated data. TTL selection depends on how frequently data changes and how tolerant your application is to stale data.

Memory Management: Caches have finite memory and must decide which entries to keep when full. Eviction policies determine which entries are removed when space is needed. The choice of eviction policy significantly impacts cache effectiveness for different access patterns.

Caching Levels

Modern applications typically employ multiple caching layers, each optimized for different access patterns and latency requirements. Understanding where to cache data requires analyzing request patterns, data volatility, and consistency requirements.

Client-Side Caching

Client-side caches store data directly in the user's browser or mobile application, providing the fastest possible access with zero network latency. This layer is ideal for static assets, user preferences, and data that doesn't change frequently.

Browser Cache: Browsers automatically cache HTTP responses based on cache control headers. This is particularly effective for static assets like JavaScript bundles, CSS files, and images. The browser checks cache validity before making network requests, reducing bandwidth and improving page load times.

Service Workers: Modern web applications can use service workers to implement sophisticated caching strategies, including offline support and background sync. Service workers intercept network requests and can serve cached responses even when the network is unavailable.

Local Storage: Application state, user preferences, and session data can be cached in browser local storage or IndexedDB for instant access. This is particularly useful for single-page applications that maintain client-side state.

Mobile App Cache: Native mobile applications can cache data in local databases (SQLite, Realm, Core Data) or in-memory caches. This enables offline functionality and reduces cellular data usage. See our mobile offline-first architecture guide for implementation details.

CDN Caching

Content Delivery Networks cache static and dynamic content at edge locations close to users, reducing latency and origin server load. CDNs are essential for serving static assets at scale and can also cache API responses for appropriate use cases.

CDN caching is most effective for:

Static assets (images, videos, scripts, stylesheets)
Publicly accessible content that doesn't vary by user
API responses that are identical for many users
Content that changes infrequently

CDN cache behavior is controlled by HTTP headers sent from your origin server. Understanding these headers is crucial for effective CDN caching (see HTTP Caching Headers below).

API Gateway Caching

API gateways can cache responses at the edge of your application infrastructure, reducing load on backend services. This is particularly effective for read-heavy APIs where many clients request identical data.

// Spring Cloud Gateway cache configuration example
@Configuration
public class GatewayCacheConfig {

    @Bean
    public RouteLocator cacheRoutes(RouteLocatorBuilder builder) {
        return builder.routes()
            .route("cached_route", r -> r
                .path("/api/products/**")
                .filters(f -> f
                    .cache(config -> config
                        .setMaxAge(Duration.ofMinutes(5))
                        .setCachePrivate(false)
                    )
                )
                .uri("lb://product-service")
            )
            .build();
    }
}

Gateway caching works well when responses are identical for many users and don't contain user-specific data. For personalized responses, consider application-level caching instead.

Application-Level Caching

Application caches store computed results, database query results, and expensive operations in memory for fast retrieval. This is the most flexible caching layer and where most business logic caching occurs.

Multi-Level Caching Architecture

The diagram below illustrates a three-tier caching strategy. Each tier represents a tradeoff between latency, capacity, and consistency. Requests check L1 first (fastest, smallest capacity), falling back to L2 and finally L3 (slowest, infinite capacity).

How it works: When data is requested, the application checks L1 cache first. If found (cache hit), the data is returned immediately with sub-millisecond latency. On a miss, the application checks L2 distributed cache, which adds a few milliseconds for network round-trip but avoids database queries. If both caches miss, the application queries the database, which may take tens or hundreds of milliseconds. After fetching from the database, the application typically populates both cache levels for subsequent requests.

Why TTLs matter: Time-To-Live (TTL) values balance freshness with performance. Shorter TTLs ensure more current data but increase cache misses and database load. Longer TTLs improve performance but risk serving stale data. Choose TTLs based on how frequently your data changes and tolerance for staleness.

┌─────────────────────────────────────────────────────┐
│ Request                                              │
└───────────────┬─────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────────────┐
│ L1: In-Memory Cache (Caffeine)                      │
│ - Hot data (frequently accessed records)            │
│ - Latency: Sub-millisecond                          │
│ - TTL: 5-10 minutes                                 │
│ - Size: 10,000 entries (limited by heap)            │
└───────────────┬─────────────────────────────────────┘
                │ Cache Miss
                ▼
┌─────────────────────────────────────────────────────┐
│ L2: Distributed Cache (Redis)                       │
│ - Shared across all application instances           │
│ - Latency: 1-5 milliseconds                         │
│ - TTL: 1-24 hours                                   │
│ - Size: GB-scale (eviction policy configured)       │
└───────────────┬─────────────────────────────────────┘
                │ Cache Miss
                ▼
┌─────────────────────────────────────────────────────┐
│ L3: Database                                         │
│ - Source of truth                                    │
│ - Latency: 10-100+ milliseconds                     │
│ - Query cache enabled in database                   │
└─────────────────────────────────────────────────────┘

In-Memory Caching with Caffeine

Caffeine is a high-performance in-process cache for Java that provides near-optimal hit rates through a Window TinyLFU eviction policy. This algorithm tracks both recency and frequency of access, ensuring that frequently accessed items remain cached even if they weren't accessed recently.

Why Caffeine over ConcurrentHashMap: While you could implement caching with a ConcurrentHashMap, you'd need to manually implement size limits, eviction, expiration, statistics tracking, and thread-safe operations. Caffeine provides all of this with carefully optimized implementations that handle concurrent access patterns efficiently.

Performance characteristics: L1 cache hits are served from JVM heap memory without any serialization, network calls, or thread context switches. This typically means sub-millisecond latency - often microseconds. However, L1 caches consume heap memory, so size limits are essential to prevent OutOfMemory errors.

// Caffeine cache example with expiration and size limits
@Configuration
public class CacheConfig {

    @Bean
    public Cache<String, User> userCache() {
        return Caffeine.newBuilder()
            .expireAfterWrite(Duration.ofMinutes(10))
            .maximumSize(10_000)
            .recordStats() // Enable hit/miss metrics
            .build();
    }

    @Bean
    public Cache<String, Payment> paymentCache() {
        return Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofMinutes(10))  // Hard TTL
            .expireAfterAccess(Duration.ofMinutes(5))  // Sliding window - reset on access
            .recordStats()
            .build();
    }

    @Bean
    public Cache<String, Customer> customerCache() {
        return Caffeine.newBuilder()
            .maximumSize(50_000)
            .expireAfterWrite(Duration.ofHours(1))
            .refreshAfterWrite(Duration.ofMinutes(30)) // Background refresh before expiry
            .recordStats()
            .build();
    }
}

// Using the cache in a service
@Service
public class UserService {

    private final Cache<String, User> userCache;
    private final UserRepository userRepository;

    public User findById(String userId) {
        return userCache.get(userId, key -> {
            // Cache miss - fetch from database
            return userRepository.findById(key)
                .orElseThrow(() -> new UserNotFoundException(key));
        });
    }

    public void updateUser(User user) {
        userRepository.save(user);
        // Invalidate cache after update
        userCache.invalidate(user.getId());
    }
}

Spring Cache Abstraction with Caffeine

Spring's cache abstraction provides declarative caching through annotations, eliminating boilerplate cache management code. The framework intercepts method calls and checks the cache before executing the method body. This approach separates caching concerns from business logic, making code cleaner and more maintainable.

@Configuration
@EnableCaching
public class CacheConfig {

    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager("accounts", "users", "payments");
        cacheManager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofMinutes(10))
            .recordStats());
        return cacheManager;
    }
}

Spring Cache Annotations:

@Cacheable: Checks cache before method execution; caches result on cache miss
@CachePut: Always executes method and updates cache with result (useful for updates)
@CacheEvict: Removes entries from cache, maintaining consistency after deletions or updates
@Caching: Combines multiple cache operations

Cache keys: The key parameter uses Spring Expression Language (SpEL) to derive cache keys from method parameters. Proper key design is critical - keys must uniquely identify cached data to prevent collisions. For composite keys, consider key = "#userId + '-' + #accountId".

@Service
public class AccountService {

    // Cache read - result cached using accountId as key
    @Cacheable(value = "accounts", key = "#accountId")
    public Account getAccount(String accountId) {
        log.debug("Fetching account from database: {}", accountId);
        return accountRepository.findById(accountId)
            .orElseThrow(() -> new AccountNotFoundException(accountId));
    }

    // Cache update - replaces cached entry with new value
    @CachePut(value = "accounts", key = "#account.id")
    public Account updateAccount(Account account) {
        return accountRepository.save(account);
    }

    // Cache eviction - removes specific entry on deletion
    @CacheEvict(value = "accounts", key = "#accountId")
    public void deleteAccount(String accountId) {
        accountRepository.deleteById(accountId);
    }

    // Evict all entries - useful after bulk operations
    @CacheEvict(value = "accounts", allEntries = true)
    public void clearCache() {
        log.info("Clearing all account cache");
    }
}

Distributed Caching: For applications running multiple instances, distributed caches like Redis or Memcached provide shared cache storage accessible to all instances. This enables cache sharing and reduces redundant computation across the cluster.

Database Query Caching

Database systems implement their own caching layers for query results and execution plans. Understanding how database caching works helps you design queries that leverage these caches effectively.

Query Result Caching: Databases cache the results of recent queries. Identical queries can be served from cache without re-executing. However, this cache is invalidated when underlying tables change, making it most effective for read-heavy workloads with infrequent writes.

Prepared Statement Caching: Database drivers cache compiled query execution plans, avoiding repeated parsing and optimization. Using parameterized queries (not string concatenation) enables this caching and also prevents SQL injection.

Cache Invalidation Strategies

Cache invalidation - ensuring cached data stays synchronized with the source of truth - is one of the hardest problems in computer science. Different invalidation strategies offer different trade-offs between consistency, complexity, and performance.

Time-To-Live (TTL) Expiration

TTL-based invalidation sets an expiration time on cache entries. After the TTL expires, the entry is automatically removed and must be re-fetched. This is the simplest invalidation strategy but requires balancing freshness against cache hit rates.

// TypeScript example with node-cache
import NodeCache from 'node-cache';

const cache = new NodeCache({
  stdTTL: 600,        // Default TTL: 10 minutes
  checkperiod: 120    // Check for expired entries every 2 minutes
});

async function getProduct(productId: string): Promise<Product> {
  const cached = cache.get<Product>(productId);
  if (cached) {
    return cached;
  }

  // Cache miss - fetch from database
  const product = await productRepository.findById(productId);

  // Store with custom TTL based on product type
  const ttl = product.isPromotional ? 60 : 600; // Promotional products expire faster
  cache.set(productId, product, ttl);

  return product;
}

TTL selection requires understanding your data characteristics:

Frequently changing data: Short TTL (seconds to minutes) to minimize staleness
Static or slowly changing data: Long TTL (hours to days) to maximize hit rates
Real-time requirements: Very short TTL or skip caching entirely
Acceptable staleness: Longer TTL within acceptable staleness window

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern where the application code manages cache population. On a cache miss, the application fetches data from the database and writes it to the cache for future requests. This pattern gives you fine-grained control over what gets cached and when.

// Cache-aside pattern in Spring Boot
@Service
public class ProductService {

    private final ProductRepository repository;
    private final RedisTemplate<String, Product> redisTemplate;
    private static final String CACHE_KEY_PREFIX = "product:";
    private static final Duration CACHE_TTL = Duration.ofHours(1);

    public Product getProduct(String productId) {
        String cacheKey = CACHE_KEY_PREFIX + productId;

        // Try cache first
        Product cached = redisTemplate.opsForValue().get(cacheKey);
        if (cached != null) {
            return cached;
        }

        // Cache miss - fetch from database
        Product product = repository.findById(productId)
            .orElseThrow(() -> new ProductNotFoundException(productId));

        // Populate cache for future requests
        redisTemplate.opsForValue().set(cacheKey, product, CACHE_TTL);

        return product;
    }

    public void updateProduct(Product product) {
        repository.save(product);

        // Invalidate cache after update
        String cacheKey = CACHE_KEY_PREFIX + product.getId();
        redisTemplate.delete(cacheKey);
    }
}

Cache-aside is appropriate when:

You want explicit control over caching behavior
Different data types need different caching strategies
Only frequently accessed data should be cached (lazy loading naturally handles this)
Cache failures shouldn't cause application failures (cache is treated as enhancement, not requirement)

The main drawback is that every cache miss requires two round trips: one to check the cache and another to fetch from the database. For high-traffic scenarios, this can create thundering herd problems where many requests simultaneously discover a cache miss and all query the database.

Read-Through Caching

Read-through caching moves cache management into a caching library or layer that sits between your application and the database. The application always queries the cache, and the cache layer handles fetching from the database on misses.

// Read-through cache using Spring Cache abstraction
@Service
public class ProductService {

    private final ProductRepository repository;

    @Cacheable(value = "products", key = "#productId")
    public Product getProduct(String productId) {
        // Cache abstraction handles cache check and population
        // This method only executes on cache miss
        return repository.findById(productId)
            .orElseThrow(() -> new ProductNotFoundException(productId));
    }

    @CachePut(value = "products", key = "#product.id")
    public Product updateProduct(Product product) {
        return repository.save(product);
    }

    @CacheEvict(value = "products", key = "#productId")
    public void deleteProduct(String productId) {
        repository.deleteById(productId);
    }
}

Read-through caching simplifies application code by abstracting cache management. The cache layer becomes responsible for fetching data on misses, reducing duplication. However, this pattern requires a caching framework that supports read-through semantics (like Spring Cache, JCache, or Hibernate second-level cache).

Write-Through Caching

Write-through caching updates both the cache and the database synchronously on every write. This ensures the cache always contains current data but adds latency to write operations since both operations must complete.

@Service
public class AccountService {

    private final AccountRepository repository;
    private final RedisTemplate<String, Account> redisTemplate;

    public Account updateBalance(String accountId, BigDecimal newBalance) {
        Account account = repository.findById(accountId)
            .orElseThrow(() -> new AccountNotFoundException(accountId));

        account.setBalance(newBalance);

        // Write to database first (source of truth)
        Account savedAccount = repository.save(account);

        // Write to cache to keep it fresh
        String cacheKey = "account:" + accountId;
        redisTemplate.opsForValue().set(cacheKey, savedAccount, Duration.ofMinutes(30));

        return savedAccount;
    }
}

Write-through is appropriate when:

Read-heavy workloads benefit from always-warm cache
Strong consistency between cache and database is required
Write latency is acceptable (both database and cache writes must complete)

The primary advantage is cache consistency - the cache never contains stale data. The disadvantage is increased write latency since both storage systems must be updated synchronously. If cache writes fail, you must decide whether to fail the entire operation or accept temporary inconsistency.

Write-Behind (Write-Back) Caching

Write-behind caching updates the cache immediately but queues database updates for asynchronous processing. This provides low write latency but risks data loss if the cache fails before database synchronization completes.

@Service
public class ViewCountService {

    private final RedisTemplate<String, Long> redisTemplate;
    private final ViewCountRepository repository;
    private final ExecutorService asyncExecutor;

    public void incrementViewCount(String articleId) {
        String cacheKey = "views:" + articleId;

        // Immediately update cache
        Long newCount = redisTemplate.opsForValue().increment(cacheKey);

        // Asynchronously persist to database every 100 views or 5 minutes
        if (newCount % 100 == 0) {
            asyncExecutor.submit(() -> {
                repository.updateViewCount(articleId, newCount);
            });
        }
    }

    @Scheduled(fixedDelay = 300000) // Every 5 minutes
    public void flushPendingCounts() {
        // Batch persist all pending view counts
        Set<String> keys = redisTemplate.keys("views:*");
        for (String key : keys) {
            Long count = redisTemplate.opsForValue().get(key);
            String articleId = key.substring("views:".length());
            repository.updateViewCount(articleId, count);
        }
    }
}

Write-behind caching is suitable for:

High-frequency writes where write latency is critical (analytics, counters, logs)
Data where recent accuracy is acceptable and eventual persistence is sufficient
Scenarios where you can batch multiple writes for efficiency

This pattern requires careful consideration of failure scenarios. If the cache crashes before asynchronously persisting data, you lose those updates. Implement proper queue persistence (Redis AOF, persistent message queues) and monitoring to minimize risk.

Event-Based Invalidation

Event-based invalidation uses pub/sub messaging or event streams to notify caches when underlying data changes. This provides more precise invalidation than TTL-based approaches and can maintain stronger consistency across distributed caches.

// Publisher: Invalidate cache when data changes
@Service
public class UserService {

    private final UserRepository repository;
    private final RedisTemplate<String, String> redisTemplate;
    private final ApplicationEventPublisher eventPublisher;

    public User updateUser(User user) {
        User updated = repository.save(user);

        // Publish invalidation event
        eventPublisher.publishEvent(new UserUpdatedEvent(user.getId()));

        return updated;
    }
}

// Subscriber: Listen for invalidation events
@Component
public class CacheInvalidationListener {

    private final Cache<String, User> userCache;

    @EventListener
    public void handleUserUpdate(UserUpdatedEvent event) {
        // Invalidate local cache
        userCache.invalidate(event.getUserId());
    }
}

// Using Redis Pub/Sub for distributed invalidation
@Service
public class DistributedCacheInvalidation {

    private final RedisTemplate<String, String> redisTemplate;
    private final Cache<String, User> localCache;

    public void invalidateUserCache(String userId) {
        // Invalidate local cache
        localCache.invalidate(userId);

        // Publish to all instances via Redis pub/sub
        redisTemplate.convertAndSend("cache:invalidate:user", userId);
    }

    @RedisMessageListener(topic = "cache:invalidate:user")
    public void onInvalidationMessage(String userId) {
        localCache.invalidate(userId);
    }
}

Event-based invalidation provides precise control over cache consistency but adds complexity. You must handle message delivery failures, duplicate messages, and ensure all cache layers subscribe to invalidation events. This pattern works well for critical data where TTL-based expiration is too coarse.

Distributed Caching

When applications scale horizontally across multiple instances, in-memory caching becomes insufficient since each instance maintains its own cache. Distributed caches provide shared storage accessible to all application instances, enabling cache sharing and coordination.

Redis

Redis is an in-memory data structure store commonly used as a distributed cache. It provides rich data types (strings, hashes, lists, sets, sorted sets), atomic operations, pub/sub messaging, and persistence options.

// Spring Boot Redis configuration
@Configuration
@EnableCaching
public class RedisCacheConfig {

    @Bean
    public RedisCacheConfiguration cacheConfiguration() {
        return RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(10))
            .serializeKeysWith(
                RedisSerializationContext.SerializationPair
                    .fromSerializer(new StringRedisSerializer())
            )
            .serializeValuesWith(
                RedisSerializationContext.SerializationPair
                    .fromSerializer(new GenericJackson2JsonRedisSerializer())
            )
            .disableCachingNullValues();
    }

    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory connectionFactory) {
        return RedisCacheManager.builder(connectionFactory)
            .cacheDefaults(cacheConfiguration())
            .withInitialCacheConfigurations(Map.of(
                "users", cacheConfiguration().entryTtl(Duration.ofHours(1)),
                "sessions", cacheConfiguration().entryTtl(Duration.ofMinutes(30)),
                "products", cacheConfiguration().entryTtl(Duration.ofDays(1))
            ))
            .build();
    }
}

Redis Advantages:

Rich data structures beyond simple key-value (lists, sets, sorted sets, hashes)
Atomic operations prevent race conditions
Pub/sub messaging enables event-based invalidation
Optional persistence (RDB snapshots, AOF logging) provides durability
Clustering and replication support high availability

Redis Considerations:

Network latency (slower than in-memory but much faster than database)
Memory limits require eviction policies
Single-threaded core (use connection pooling for concurrency)
Serialization overhead (choose efficient formats like MessagePack or Protobuf for large objects)

Memcached

Memcached is a simple, high-performance distributed memory cache designed for speed and simplicity. It provides basic key-value storage with automatic expiration and LRU eviction.

// Node.js Memcached example
import Memcached from 'memcached';

const memcached = new Memcached('localhost:11211', {
  retries: 3,
  timeout: 500,
  reconnect: 1000
});

async function getCachedData(key: string): Promise<any> {
  return new Promise((resolve, reject) => {
    memcached.get(key, (err, data) => {
      if (err) return reject(err);
      resolve(data);
    });
  });
}

async function setCachedData(key: string, value: any, ttl: number = 600): Promise<void> {
  return new Promise((resolve, reject) => {
    memcached.set(key, value, ttl, (err) => {
      if (err) return reject(err);
      resolve();
    });
  });
}

// Usage in a service
async function getUser(userId: string): Promise<User> {
  const cacheKey = `user:${userId}`;

  const cached = await getCachedData(cacheKey);
  if (cached) {
    return cached;
  }

  const user = await db.users.findById(userId);
  await setCachedData(cacheKey, user, 3600); // Cache for 1 hour

  return user;
}

Memcached Advantages:

Extremely fast (optimized for simple get/set operations)
Simple protocol and minimal overhead
Multi-threaded architecture for high concurrency
Automatic memory management with LRU eviction

Memcached Limitations:

No persistence (purely in-memory, data lost on restart)
Limited to simple key-value storage (no complex data structures)
No built-in pub/sub or event notifications
Maximum value size (1MB by default)

Choosing Between Redis and Memcached:

Use Redis when you need complex data structures, persistence, pub/sub, or atomic operations
Use Memcached when you need the absolute fastest simple key-value caching
Use Redis for most modern applications due to its richer feature set

Hazelcast

Hazelcast is a distributed in-memory data grid that provides distributed caching with strong consistency guarantees and embedded (in-process) deployment options.

// Hazelcast embedded cache configuration
@Configuration
public class HazelcastConfig {

    @Bean
    public Config hazelcastConfig() {
        Config config = new Config();

        // Configure distributed map (cache)
        config.addMapConfig(
            new MapConfig("users")
                .setTimeToLiveSeconds(600)
                .setMaxIdleSeconds(300)
                .setEvictionConfig(
                    new EvictionConfig()
                        .setEvictionPolicy(EvictionPolicy.LRU)
                        .setMaxSizePolicy(MaxSizePolicy.PER_NODE)
                        .setSize(10000)
                )
                .setBackupCount(1) // One backup for high availability
        );

        return config;
    }

    @Bean
    public HazelcastInstance hazelcastInstance(Config config) {
        return Hazelcast.newHazelcastInstance(config);
    }
}

// Using Hazelcast for distributed caching
@Service
public class UserService {

    private final HazelcastInstance hazelcast;
    private final IMap<String, User> userCache;

    public UserService(HazelcastInstance hazelcast) {
        this.hazelcast = hazelcast;
        this.userCache = hazelcast.getMap("users");
    }

    public User getUser(String userId) {
        return userCache.computeIfAbsent(userId, key -> {
            // Cache miss - fetch from database
            return userRepository.findById(key)
                .orElseThrow(() -> new UserNotFoundException(key));
        });
    }
}

Hazelcast provides unique features like near-cache (local copy with distributed synchronization), distributed locks, and in-memory compute. It's particularly useful for applications requiring strong consistency or complex distributed coordination.

HTTP Caching Headers

HTTP caching headers control how browsers, CDNs, and proxies cache HTTP responses. Understanding these headers is essential for effective client-side and CDN caching.

Cache-Control

The Cache-Control header provides comprehensive control over caching behavior. It replaces older headers like Expires and Pragma.

// Spring Boot example setting Cache-Control headers
@RestController
@RequestMapping("/api")
public class ProductController {

    @GetMapping("/products/{id}")
    public ResponseEntity<Product> getProduct(@PathVariable String id) {
        Product product = productService.getProduct(id);

        return ResponseEntity.ok()
            .cacheControl(CacheControl
                .maxAge(5, TimeUnit.MINUTES)
                .cachePublic() // Allows CDN/shared caches
                .mustRevalidate() // Must check with origin after expiration
            )
            .eTag(product.getVersion()) // Enable conditional requests
            .body(product);
    }

    @GetMapping("/users/me")
    public ResponseEntity<User> getCurrentUser(Principal principal) {
        User user = userService.getUser(principal.getName());

        return ResponseEntity.ok()
            .cacheControl(CacheControl
                .maxAge(1, TimeUnit.MINUTES)
                .cachePrivate() // Only browser cache, not CDN
                .noStore() // Sensitive data - don't persist to disk
            )
            .body(user);
    }

    @GetMapping("/products/search")
    public ResponseEntity<List<Product>> searchProducts(@RequestParam String query) {
        List<Product> results = productService.search(query);

        return ResponseEntity.ok()
            .cacheControl(CacheControl.noCache()) // Always revalidate
            .body(results);
    }
}

Common Cache-Control directives:

public: Response can be cached by any cache (browser, CDN, proxy)
private: Only browser cache, not shared caches (for user-specific content)
no-cache: Must revalidate with origin before using cached response
no-store: Don't cache at all (for sensitive data)
max-age=N: Cache is valid for N seconds
s-maxage=N: Override max-age for shared caches (CDN) only
must-revalidate: Must not use stale cache after expiration
immutable: Content never changes (perfect for versioned assets)

ETag and Conditional Requests

ETags (entity tags) enable conditional requests where the client asks "has this resource changed since my last request?" If unchanged, the server responds with 304 Not Modified, saving bandwidth.

// Express.js example with ETag support
import express from 'express';
import crypto from 'crypto';

const app = express();

function generateETag(data: any): string {
  const hash = crypto.createHash('md5');
  hash.update(JSON.stringify(data));
  return `"${hash.digest('hex')}"`;
}

app.get('/api/products/:id', async (req, res) => {
  const product = await productService.getProduct(req.params.id);

  const etag = generateETag(product);
  const clientETag = req.headers['if-none-match'];

  // Client has current version
  if (clientETag === etag) {
    return res.status(304).end();
  }

  // Send full response with ETag
  res.set('ETag', etag);
  res.set('Cache-Control', 'public, max-age=300');
  res.json(product);
});

ETags work well for dynamic content where exact expiration is hard to predict. The client caches the response with its ETag, then sends the ETag in subsequent requests. If content hasn't changed, the server responds with 304 Not Modified, avoiding response body transmission.

Last-Modified and Conditional Requests

Similar to ETags, Last-Modified enables conditional requests based on modification time. The client sends If-Modified-Since with cached response timestamp.

@GetMapping("/api/documents/{id}")
public ResponseEntity<Document> getDocument(@PathVariable String id,
                                           @RequestHeader(value = "If-Modified-Since", required = false)
                                           @DateTimeFormat(iso = DateTimeFormat.ISO.DATE_TIME)
                                           ZonedDateTime ifModifiedSince) {

    Document document = documentService.getDocument(id);
    ZonedDateTime lastModified = document.getUpdatedAt();

    // Return 304 if not modified since client's last request
    if (ifModifiedSince != null && !lastModified.isAfter(ifModifiedSince)) {
        return ResponseEntity.status(HttpStatus.NOT_MODIFIED)
            .lastModified(lastModified.toInstant())
            .build();
    }

    // Return full document with Last-Modified header
    return ResponseEntity.ok()
        .lastModified(lastModified.toInstant())
        .cacheControl(CacheControl.maxAge(10, TimeUnit.MINUTES))
        .body(document);
}

Last-Modified is simpler than ETags but less precise (second-level granularity). Use Last-Modified when resources have clear modification timestamps. Use ETags when modification time is unavailable or when second-level precision is insufficient.

Expires Header

The Expires header sets an absolute expiration date/time for cached responses. Cache-Control max-age is preferred because it's relative and doesn't require synchronized clocks, but Expires provides backwards compatibility.

@GetMapping("/api/static-config")
public ResponseEntity<Configuration> getConfig() {
    Configuration config = configService.getConfig();

    ZonedDateTime expires = ZonedDateTime.now().plusHours(24);

    return ResponseEntity.ok()
        .expires(expires.toInstant().toEpochMilli())
        .cacheControl(CacheControl.maxAge(24, TimeUnit.HOURS)) // Preferred
        .body(config);
}

Cache Warming and Pre-fetching

Cache warming populates the cache before requests arrive, avoiding cold start performance degradation. After deployments or cache invalidations, an empty cache causes slow response times until it repopulates through normal traffic.

@Component
public class CacheWarmer {

    private final ProductService productService;
    private final CategoryService categoryService;

    @EventListener(ApplicationReadyEvent.class)
    public void warmCacheOnStartup() {
        log.info("Starting cache warming...");

        // Warm product cache with popular items
        List<String> popularProductIds = analyticsService.getPopularProducts(100);
        popularProductIds.parallelStream()
            .forEach(id -> {
                try {
                    productService.getProduct(id); // Populates cache
                } catch (Exception e) {
                    log.warn("Failed to warm cache for product {}", id, e);
                }
            });

        // Warm category tree (frequently accessed)
        categoryService.getCategoryTree();

        log.info("Cache warming complete");
    }

    @Scheduled(cron = "0 0 */6 * * *") // Every 6 hours
    public void refreshPopularProducts() {
        // Periodically refresh popular items before they expire
        List<String> popularProductIds = analyticsService.getPopularProducts(100);
        popularProductIds.forEach(id -> {
            productService.refreshProduct(id); // Updates cache
        });
    }
}

Cache warming strategies:

Startup warming: Pre-load critical data when the application starts, preventing initial request slowness. Use this for navigation menus, configuration data, and most-accessed items.

Scheduled warming: Refresh cache entries before they expire to maintain high hit rates. This works well for slowly-changing data that's expensive to compute.

Predictive pre-fetching: Anticipate what users will request next and pre-fetch it. For example, when a user views a product list, pre-fetch the first few product details. This improves perceived performance but wastes cache space if predictions are wrong.

Batch warming: After large data imports or batch operations, warm affected cache entries to prevent thundering herd when users request updated data.

Cache Stampede Prevention

Cache stampede (thundering herd) occurs when many requests simultaneously discover a cache miss for the same key and all query the backing store. This can overwhelm databases and cause cascading failures.

Request Coalescing

Request coalescing ensures only one request fetches data while others wait for the result. This prevents duplicate database queries for the same key.

@Service
public class ProductService {

    private final Cache<String, CompletableFuture<Product>> inflightRequests =
        Caffeine.newBuilder()
            .expireAfterWrite(Duration.ofSeconds(30))
            .build();

    private final Cache<String, Product> productCache;
    private final ProductRepository repository;

    public CompletableFuture<Product> getProduct(String productId) {
        // Check cache first
        Product cached = productCache.getIfPresent(productId);
        if (cached != null) {
            return CompletableFuture.completedFuture(cached);
        }

        // Coalesce concurrent requests for the same key
        return inflightRequests.get(productId, key -> {
            return CompletableFuture.supplyAsync(() -> {
                // Only one request per key executes this block
                Product product = repository.findById(key)
                    .orElseThrow(() -> new ProductNotFoundException(key));

                // Populate cache for future requests
                productCache.put(key, product);

                // Remove from inflight tracking
                inflightRequests.invalidate(key);

                return product;
            });
        });
    }
}

Request coalescing dramatically reduces database load during cache stampedes by ensuring only one request per key reaches the database, regardless of concurrent request count.

Probabilistic Early Expiration

Probabilistic early expiration randomly refreshes cache entries slightly before their actual expiration to spread refresh load over time instead of all at once.

public class ProbabilisticCache<K, V> {

    private final Cache<K, CacheEntry<V>> cache;
    private final Random random = new Random();

    public V get(K key, Function<K, V> loader, Duration ttl) {
        CacheEntry<V> entry = cache.getIfPresent(key);

        if (entry != null) {
            long age = System.currentTimeMillis() - entry.createdAt;
            long ttlMillis = ttl.toMillis();

            // Probability of early refresh increases as entry ages
            // Formula: age / (ttl * (1 - beta * random))
            // Where beta controls randomness (typically 0.1 to 0.2)
            double beta = 0.1;
            double threshold = ttlMillis * (1 - beta * random.nextDouble());

            if (age > threshold) {
                // Probabilistically refresh before expiration
                V newValue = loader.apply(key);
                cache.put(key, new CacheEntry<>(newValue));
                return newValue;
            }

            return entry.value;
        }

        // Cache miss - load and store
        V value = loader.apply(key);
        cache.put(key, new CacheEntry<>(value));
        return value;
    }

    private static class CacheEntry<V> {
        final V value;
        final long createdAt;

        CacheEntry(V value) {
            this.value = value;
            this.createdAt = System.currentTimeMillis();
        }
    }
}

This approach spreads cache refreshes over time, preventing synchronized expiration of many entries and subsequent stampede.

Locking with Expiration

Distributed locks ensure only one process refreshes an expired cache entry while others wait or return stale data. This requires a distributed lock mechanism (Redis, Zookeeper, database locks).

@Service
public class CachedDataService {

    private final RedissonClient redisson;
    private final DataRepository repository;
    private final Cache<String, Data> localCache;

    public Data getData(String key) {
        // Check local cache
        Data cached = localCache.getIfPresent(key);
        if (cached != null && !isExpired(cached)) {
            return cached;
        }

        // Try to acquire lock for refresh
        RLock lock = redisson.getLock("cache:refresh:" + key);
        boolean acquired = false;

        try {
            acquired = lock.tryLock(100, TimeUnit.MILLISECONDS);

            if (acquired) {
                // This request refreshes the cache
                Data data = repository.findById(key);
                localCache.put(key, data);
                return data;
            } else {
                // Another request is refreshing, return stale or wait
                if (cached != null) {
                    // Serve stale data while refresh happens
                    return cached;
                } else {
                    // No stale data, must wait for refresh
                    Thread.sleep(50);
                    return getData(key); // Retry after brief wait
                }
            }
        } finally {
            if (acquired) {
                lock.unlock();
            }
        }
    }
}

Eviction Policies

When caches reach memory capacity, eviction policies determine which entries to remove. The choice of eviction policy significantly impacts cache effectiveness for different access patterns.

Least Recently Used (LRU)

LRU evicts the entry that hasn't been accessed for the longest time. This works well for many access patterns because recently accessed data is likely to be accessed again soon (temporal locality).

// Caffeine uses Window TinyLFU by default, but can be configured for LRU-like behavior
Cache<String, Product> lruCache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .build();

LRU performs well when:

Access patterns exhibit temporal locality (recently accessed items are accessed again soon)
Working set fits mostly in cache
Scanning workloads don't pollute the cache too badly

LRU performs poorly when:

Access patterns are cyclic and exceed cache size (everything becomes "least recently used")
Scanning large datasets pushes out frequently accessed data

Least Frequently Used (LFU)

LFU evicts the entry with the lowest access frequency. This retains popular items even if they haven't been accessed very recently.

// Caffeine's Window TinyLFU combines recency and frequency
Cache<String, Product> frequencyAwareCache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .recordStats()
    .build();

LFU advantages:

Retains frequently accessed items regardless of temporal gaps
Less susceptible to scanning workloads
Better for access patterns with clear "popular" items

LFU disadvantages:

Historical frequency can become stale (item accessed 1000 times last month but never since)
New items struggle to gain frequency count
Requires more metadata (access counts)

Modern implementations like Window TinyLFU combine LRU and LFU advantages while mitigating disadvantages.

First-In-First-Out (FIFO)

FIFO evicts the oldest entry regardless of access patterns. This is simpler to implement but generally performs worse than LRU or LFU.

FIFO is appropriate only when:

Cache entries have similar access probabilities
Implementation simplicity is more important than hit rate
Access patterns are truly random

Random Eviction

Random eviction selects a random entry for removal. This is the simplest policy and surprisingly effective for certain workloads, especially when access patterns are random.

Random eviction:

Performs better than LRU for cyclic scans
Simpler implementation (no access tracking)
Unpredictable performance for specific keys
Generally worse than LRU for most real-world workloads

Time-To-Live (TTL) Expiration

TTL-based eviction removes entries after a fixed time period, regardless of space constraints. This is often combined with size-based eviction policies.

Cache<String, Product> ttlCache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(10)) // Absolute expiration
    .expireAfterAccess(Duration.ofMinutes(5)) // Idle expiration
    .maximumSize(10_000) // Also enforce size limit
    .build();

TTL expiration is essential when:

Data becomes stale after a known time period
Compliance requires removing data after specific duration
Memory constraints require removing old entries even if space available

Monitoring Cache Performance

Effective caching requires continuous monitoring to understand cache behavior, identify problems, and optimize configuration. Key metrics reveal cache effectiveness and guide tuning decisions.

Essential Cache Metrics

@Component
public class CacheMetrics {

    private final MeterRegistry registry;
    private final Cache<String, Product> productCache;

    @Scheduled(fixedDelay = 60000) // Every minute
    public void recordCacheMetrics() {
        CacheStats stats = productCache.stats();

        // Hit rate: percentage of requests served from cache
        registry.gauge("cache.hit.rate", stats.hitRate());

        // Miss rate: percentage of requests requiring fetch
        registry.gauge("cache.miss.rate", stats.missRate());

        // Load success/failure rates
        registry.gauge("cache.load.success.rate", stats.loadSuccessCount() /
            (double) stats.loadCount());

        // Average load time (microseconds)
        registry.gauge("cache.load.average.millis",
            stats.averageLoadPenalty() / 1_000_000);

        // Eviction count
        registry.gauge("cache.eviction.count", stats.evictionCount());

        // Estimated cache size
        registry.gauge("cache.size", productCache.estimatedSize());
    }
}

Cache Hit Rate: The percentage of requests served from cache without fetching from backing store. Higher is better - aim for 80%+ for effective caching. Low hit rates suggest poor TTL configuration, insufficient cache size, or inappropriate caching strategy.

Miss Rate: The inverse of hit rate. High miss rates indicate cache ineffectiveness. Investigate whether the cache is too small, TTL too short, or if access patterns don't benefit from caching.

Eviction Rate: How frequently entries are evicted due to size constraints. High eviction rates suggest insufficient cache size relative to working set. Consider increasing cache size or reducing TTL to keep fewer items in cache.

Average Load Time: How long it takes to fetch data on cache misses. This helps quantify the performance benefit of caching. If load times are low, caching may not provide significant value.

Memory Usage: Track cache memory consumption to prevent out-of-memory errors and understand the memory/performance trade-off.

Monitoring Distributed Caches

// Redis monitoring example using ioredis
import Redis from 'ioredis';
import { register, Gauge, Counter } from 'prom-client';

const redis = new Redis();

const redisHits = new Counter({
  name: 'redis_cache_hits_total',
  help: 'Total number of cache hits'
});

const redisMisses = new Counter({
  name: 'redis_cache_misses_total',
  help: 'Total number of cache misses'
});

const redisMemoryUsage = new Gauge({
  name: 'redis_memory_usage_bytes',
  help: 'Redis memory usage in bytes'
});

// Wrapper that tracks metrics
async function getCached(key: string): Promise<any> {
  const value = await redis.get(key);

  if (value) {
    redisHits.inc();
    return JSON.parse(value);
  }

  redisMisses.inc();
  return null;
}

// Periodically collect Redis INFO metrics
setInterval(async () => {
  const info = await redis.info('memory');
  const memoryMatch = info.match(/used_memory:(\d+)/);
  if (memoryMatch) {
    redisMemoryUsage.set(parseInt(memoryMatch[1]));
  }
}, 60000);

Monitor distributed cache health:

Connection pool utilization: Ensure sufficient connections for concurrent requests
Network latency: Track round-trip time to cache server
Replication lag: For replicated caches, monitor lag between primary and replicas
Memory usage: Track memory consumption and eviction rates
Slow commands: Identify operations taking longer than expected

Alerting on Cache Issues

# Prometheus alerting rules for cache metrics
groups:
  - name: cache_alerts
    rules:
      - alert: LowCacheHitRate
        expr: cache_hit_rate < 0.7
        for: 10m
        annotations:
          summary: "Cache hit rate below 70%"
          description: "Cache hit rate is {{ $value }} which may indicate sizing issues"

      - alert: HighCacheEvictionRate
        expr: rate(cache_eviction_count[5m]) > 100
        for: 5m
        annotations:
          summary: "High cache eviction rate"
          description: "Evicting {{ $value }} entries per second, consider increasing cache size"

      - alert: RedisHighMemoryUsage
        expr: redis_memory_usage_bytes / redis_maxmemory_bytes > 0.9
        for: 5m
        annotations:
          summary: "Redis memory usage above 90%"
          description: "Redis using {{ $value }}% of available memory"

Set up alerts for:

Hit rate drops below acceptable threshold
Eviction rate spikes (indicates insufficient cache size)
Cache server memory approaching capacity
Cache server connection failures
Abnormally high cache load times

Common Caching Anti-Patterns

Caching Everything

Anti-pattern: Caching every database query or API response regardless of access patterns.

Problem: Wasting memory on rarely accessed data, increasing eviction rates for frequently accessed data, and adding complexity without performance benefit.

Solution: Cache based on data analysis - identify frequently accessed, expensive-to-compute data. Monitor hit rates per cache key pattern to identify ineffective caching.

Over-Caching User-Specific Data

Anti-pattern: Caching highly personalized data with low reuse across users.

Problem: Each user gets separate cache entries, fragmenting cache space and reducing hit rates. For example, caching user-specific dashboards that are unique per user provides no benefit.

Solution: Cache shared data (product catalog, categories, configuration) and compute user-specific data on each request. Consider partial caching where shared components are cached and personalized elements computed.

Ignoring Cache Invalidation

Anti-pattern: Setting long TTLs without implementing invalidation when data changes.

Problem: Serving stale data that frustrates users and creates business problems (showing wrong prices, outdated inventory).

Solution: Implement explicit cache invalidation on writes. Use shorter TTLs for frequently changing data. Consider event-based invalidation for critical data.

Cache Stampede Ignorance

Anti-pattern: Not protecting against cache stampedes when popular entries expire.

Problem: Database overload when hundreds of concurrent requests simultaneously discover a cache miss for a popular key.

Solution: Implement request coalescing, probabilistic early expiration, or distributed locks to prevent stampedes.

Caching Failures

Anti-pattern: Caching error responses or null values.

Problem: When the database is temporarily unavailable and queries fail, caching the failure means subsequent requests serve errors even after the database recovers.

Solution: Never cache errors or null values. Configure cache libraries to ignore null returns. Implement health checks and circuit breakers for backing services.

Inadequate Cache Sizing

Anti-pattern: Not analyzing working set size or monitoring eviction rates.

Problem: Cache too small results in constant evictions and low hit rates. Cache too large wastes memory and risks out-of-memory errors.

Solution: Analyze access patterns to determine working set size (amount of frequently accessed data). Size cache to hold working set plus buffer. Monitor eviction rates and adjust accordingly.

Effective caching integrates with many other system aspects:

Performance Optimization: Caching is one part of comprehensive performance strategy including database indexing, query optimization, and CDN usage
API Design: Design APIs with caching in mind - include cache headers, support conditional requests, avoid over-personalization
Observability: Monitor cache metrics alongside application metrics to understand overall system performance
Spring Boot: Spring Cache abstraction simplifies application-level caching with annotations and multiple backend support
React Performance: Client-side caching with React Query, SWR, and browser caching strategies
Database Best Practices: Database query optimization and indexing reduce the cost of cache misses
Rate Limiting: Caching reduces load that might trigger rate limits; consider cache stampede impact on rate limiting

Core Caching Principles​

Caching Levels​

Client-Side Caching​

CDN Caching​

API Gateway Caching​

Application-Level Caching​

Multi-Level Caching Architecture​

In-Memory Caching with Caffeine​

Spring Cache Abstraction with Caffeine​

Database Query Caching​

Cache Invalidation Strategies​

Time-To-Live (TTL) Expiration​

Cache-Aside (Lazy Loading)​

Read-Through Caching​

Write-Through Caching​

Write-Behind (Write-Back) Caching​

Event-Based Invalidation​

Distributed Caching​

Redis​

Memcached​

Hazelcast​

HTTP Caching Headers​

Cache-Control​

ETag and Conditional Requests​

Last-Modified and Conditional Requests​

Expires Header​

Cache Warming and Pre-fetching​

Cache Stampede Prevention​

Request Coalescing​

Probabilistic Early Expiration​

Locking with Expiration​

Eviction Policies​

Least Recently Used (LRU)​

Least Frequently Used (LFU)​

First-In-First-Out (FIFO)​

Random Eviction​

Time-To-Live (TTL) Expiration​

Monitoring Cache Performance​

Essential Cache Metrics​

Monitoring Distributed Caches​

Alerting on Cache Issues​

Common Caching Anti-Patterns​

Caching Everything​

Over-Caching User-Specific Data​

Ignoring Cache Invalidation​

Cache Stampede Ignorance​

Caching Failures​

Inadequate Cache Sizing​

Related Topics​

Core Caching Principles

Caching Levels

Client-Side Caching

CDN Caching

API Gateway Caching

Application-Level Caching

Multi-Level Caching Architecture

In-Memory Caching with Caffeine

Spring Cache Abstraction with Caffeine

Database Query Caching

Cache Invalidation Strategies

Time-To-Live (TTL) Expiration

Cache-Aside (Lazy Loading)

Read-Through Caching

Write-Through Caching

Write-Behind (Write-Back) Caching

Event-Based Invalidation

Distributed Caching

Redis

Memcached

Hazelcast

HTTP Caching Headers

Cache-Control

ETag and Conditional Requests

Last-Modified and Conditional Requests

Expires Header

Cache Warming and Pre-fetching

Cache Stampede Prevention

Request Coalescing

Probabilistic Early Expiration

Locking with Expiration

Eviction Policies

Least Recently Used (LRU)

Least Frequently Used (LFU)

First-In-First-Out (FIFO)

Random Eviction

Time-To-Live (TTL) Expiration

Monitoring Cache Performance

Essential Cache Metrics

Monitoring Distributed Caches

Alerting on Cache Issues

Common Caching Anti-Patterns

Caching Everything

Over-Caching User-Specific Data

Ignoring Cache Invalidation

Cache Stampede Ignorance

Caching Failures

Inadequate Cache Sizing

Related Topics