Skip to main content

Caching Strategies

Caching is a fundamental technique for improving application performance by storing frequently accessed data in faster storage layers. Effective caching reduces database load, decreases response times, and improves overall system scalability. However, caching introduces complexity around data consistency, invalidation, and memory management that must be carefully managed.

This guide covers multi-level caching strategies, invalidation patterns, distributed caching systems, and monitoring approaches to help you implement effective caching in your applications.

Core Caching Principles

Understanding fundamental caching concepts enables effective strategy implementation:

Cache Hit vs Cache Miss: A cache hit occurs when requested data exists in the cache and can be returned immediately. A cache miss occurs when data must be retrieved from the slower backing store (database, API, etc.). The cache hit ratio (hits / total requests) is a key performance metric - higher ratios indicate more effective caching.

Cache Consistency: Caching creates multiple copies of data across different layers. Consistency determines how quickly changes propagate. Strong consistency ensures all reads return the most recent write, while eventual consistency allows temporary staleness in exchange for better performance. Most application caching uses eventual consistency with controlled staleness bounds.

Time-To-Live (TTL): TTL defines how long data remains valid in the cache before expiring. Shorter TTLs reduce staleness but increase cache misses. Longer TTLs improve hit rates but may serve outdated data. TTL selection depends on how frequently data changes and how tolerant your application is to stale data.

Memory Management: Caches have finite memory and must decide which entries to keep when full. Eviction policies determine which entries are removed when space is needed. The choice of eviction policy significantly impacts cache effectiveness for different access patterns.

Caching Levels

Modern applications typically employ multiple caching layers, each optimized for different access patterns and latency requirements. Understanding where to cache data requires analyzing request patterns, data volatility, and consistency requirements.

Client-Side Caching

Client-side caches store data directly in the user's browser or mobile application, providing the fastest possible access with zero network latency. This layer is ideal for static assets, user preferences, and data that doesn't change frequently.

Browser Cache: Browsers automatically cache HTTP responses based on cache control headers. This is particularly effective for static assets like JavaScript bundles, CSS files, and images. The browser checks cache validity before making network requests, reducing bandwidth and improving page load times.

Service Workers: Modern web applications can use service workers to implement sophisticated caching strategies, including offline support and background sync. Service workers intercept network requests and can serve cached responses even when the network is unavailable.

Local Storage: Application state, user preferences, and session data can be cached in browser local storage or IndexedDB for instant access. This is particularly useful for single-page applications that maintain client-side state.

Mobile App Cache: Native mobile applications can cache data in local databases (SQLite, Realm, Core Data) or in-memory caches. This enables offline functionality and reduces cellular data usage. See our mobile offline-first architecture guide for implementation details.

CDN Caching

Content Delivery Networks cache static and dynamic content at edge locations close to users, reducing latency and origin server load. CDNs are essential for serving static assets at scale and can also cache API responses for appropriate use cases.

CDN caching is most effective for:

  • Static assets (images, videos, scripts, stylesheets)
  • Publicly accessible content that doesn't vary by user
  • API responses that are identical for many users
  • Content that changes infrequently

CDN cache behavior is controlled by HTTP headers sent from your origin server. Understanding these headers is crucial for effective CDN caching (see HTTP Caching Headers below).

API Gateway Caching

API gateways can cache responses at the edge of your application infrastructure, reducing load on backend services. This is particularly effective for read-heavy APIs where many clients request identical data.

// Spring Cloud Gateway cache configuration example
@Configuration
public class GatewayCacheConfig {

@Bean
public RouteLocator cacheRoutes(RouteLocatorBuilder builder) {
return builder.routes()
.route("cached_route", r -> r
.path("/api/products/**")
.filters(f -> f
.cache(config -> config
.setMaxAge(Duration.ofMinutes(5))
.setCachePrivate(false)
)
)
.uri("lb://product-service")
)
.build();
}
}

Gateway caching works well when responses are identical for many users and don't contain user-specific data. For personalized responses, consider application-level caching instead.

Application-Level Caching

Application caches store computed results, database query results, and expensive operations in memory for fast retrieval. This is the most flexible caching layer and where most business logic caching occurs.

Multi-Level Caching Architecture

The diagram below illustrates a three-tier caching strategy. Each tier represents a tradeoff between latency, capacity, and consistency. Requests check L1 first (fastest, smallest capacity), falling back to L2 and finally L3 (slowest, infinite capacity).

How it works: When data is requested, the application checks L1 cache first. If found (cache hit), the data is returned immediately with sub-millisecond latency. On a miss, the application checks L2 distributed cache, which adds a few milliseconds for network round-trip but avoids database queries. If both caches miss, the application queries the database, which may take tens or hundreds of milliseconds. After fetching from the database, the application typically populates both cache levels for subsequent requests.

Why TTLs matter: Time-To-Live (TTL) values balance freshness with performance. Shorter TTLs ensure more current data but increase cache misses and database load. Longer TTLs improve performance but risk serving stale data. Choose TTLs based on how frequently your data changes and tolerance for staleness.

┌─────────────────────────────────────────────────────┐
│ Request │
└───────────────┬─────────────────────────────────────┘


┌─────────────────────────────────────────────────────┐
│ L1: In-Memory Cache (Caffeine) │
│ - Hot data (frequently accessed records) │
│ - Latency: Sub-millisecond │
│ - TTL: 5-10 minutes │
│ - Size: 10,000 entries (limited by heap) │
└───────────────┬─────────────────────────────────────┘
│ Cache Miss

┌─────────────────────────────────────────────────────┐
│ L2: Distributed Cache (Redis) │
│ - Shared across all application instances │
│ - Latency: 1-5 milliseconds │
│ - TTL: 1-24 hours │
│ - Size: GB-scale (eviction policy configured) │
└───────────────┬─────────────────────────────────────┘
│ Cache Miss

┌─────────────────────────────────────────────────────┐
│ L3: Database │
│ - Source of truth │
│ - Latency: 10-100+ milliseconds │
│ - Query cache enabled in database │
└─────────────────────────────────────────────────────┘

In-Memory Caching with Caffeine

Caffeine is a high-performance in-process cache for Java that provides near-optimal hit rates through a Window TinyLFU eviction policy. This algorithm tracks both recency and frequency of access, ensuring that frequently accessed items remain cached even if they weren't accessed recently.

Why Caffeine over ConcurrentHashMap: While you could implement caching with a ConcurrentHashMap, you'd need to manually implement size limits, eviction, expiration, statistics tracking, and thread-safe operations. Caffeine provides all of this with carefully optimized implementations that handle concurrent access patterns efficiently.

Performance characteristics: L1 cache hits are served from JVM heap memory without any serialization, network calls, or thread context switches. This typically means sub-millisecond latency - often microseconds. However, L1 caches consume heap memory, so size limits are essential to prevent OutOfMemory errors.

// Caffeine cache example with expiration and size limits
@Configuration
public class CacheConfig {

@Bean
public Cache<String, User> userCache() {
return Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10))
.maximumSize(10_000)
.recordStats() // Enable hit/miss metrics
.build();
}

@Bean
public Cache<String, Payment> paymentCache() {
return Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10)) // Hard TTL
.expireAfterAccess(Duration.ofMinutes(5)) // Sliding window - reset on access
.recordStats()
.build();
}

@Bean
public Cache<String, Customer> customerCache() {
return Caffeine.newBuilder()
.maximumSize(50_000)
.expireAfterWrite(Duration.ofHours(1))
.refreshAfterWrite(Duration.ofMinutes(30)) // Background refresh before expiry
.recordStats()
.build();
}
}

// Using the cache in a service
@Service
public class UserService {

private final Cache<String, User> userCache;
private final UserRepository userRepository;

public User findById(String userId) {
return userCache.get(userId, key -> {
// Cache miss - fetch from database
return userRepository.findById(key)
.orElseThrow(() -> new UserNotFoundException(key));
});
}

public void updateUser(User user) {
userRepository.save(user);
// Invalidate cache after update
userCache.invalidate(user.getId());
}
}

Spring Cache Abstraction with Caffeine

Spring's cache abstraction provides declarative caching through annotations, eliminating boilerplate cache management code. The framework intercepts method calls and checks the cache before executing the method body. This approach separates caching concerns from business logic, making code cleaner and more maintainable.

@Configuration
@EnableCaching
public class CacheConfig {

@Bean
public CacheManager cacheManager() {
CaffeineCacheManager cacheManager = new CaffeineCacheManager("accounts", "users", "payments");
cacheManager.setCaffeine(Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.recordStats());
return cacheManager;
}
}

Spring Cache Annotations:

  • @Cacheable: Checks cache before method execution; caches result on cache miss
  • @CachePut: Always executes method and updates cache with result (useful for updates)
  • @CacheEvict: Removes entries from cache, maintaining consistency after deletions or updates
  • @Caching: Combines multiple cache operations

Cache keys: The key parameter uses Spring Expression Language (SpEL) to derive cache keys from method parameters. Proper key design is critical - keys must uniquely identify cached data to prevent collisions. For composite keys, consider key = "#userId + '-' + #accountId".

@Service
public class AccountService {

// Cache read - result cached using accountId as key
@Cacheable(value = "accounts", key = "#accountId")
public Account getAccount(String accountId) {
log.debug("Fetching account from database: {}", accountId);
return accountRepository.findById(accountId)
.orElseThrow(() -> new AccountNotFoundException(accountId));
}

// Cache update - replaces cached entry with new value
@CachePut(value = "accounts", key = "#account.id")
public Account updateAccount(Account account) {
return accountRepository.save(account);
}

// Cache eviction - removes specific entry on deletion
@CacheEvict(value = "accounts", key = "#accountId")
public void deleteAccount(String accountId) {
accountRepository.deleteById(accountId);
}

// Evict all entries - useful after bulk operations
@CacheEvict(value = "accounts", allEntries = true)
public void clearCache() {
log.info("Clearing all account cache");
}
}

Distributed Caching: For applications running multiple instances, distributed caches like Redis or Memcached provide shared cache storage accessible to all instances. This enables cache sharing and reduces redundant computation across the cluster.

Database Query Caching

Database systems implement their own caching layers for query results and execution plans. Understanding how database caching works helps you design queries that leverage these caches effectively.

Query Result Caching: Databases cache the results of recent queries. Identical queries can be served from cache without re-executing. However, this cache is invalidated when underlying tables change, making it most effective for read-heavy workloads with infrequent writes.

Prepared Statement Caching: Database drivers cache compiled query execution plans, avoiding repeated parsing and optimization. Using parameterized queries (not string concatenation) enables this caching and also prevents SQL injection.

Cache Invalidation Strategies

Cache invalidation - ensuring cached data stays synchronized with the source of truth - is one of the hardest problems in computer science. Different invalidation strategies offer different trade-offs between consistency, complexity, and performance.

Time-To-Live (TTL) Expiration

TTL-based invalidation sets an expiration time on cache entries. After the TTL expires, the entry is automatically removed and must be re-fetched. This is the simplest invalidation strategy but requires balancing freshness against cache hit rates.

// TypeScript example with node-cache
import NodeCache from 'node-cache';

const cache = new NodeCache({
stdTTL: 600, // Default TTL: 10 minutes
checkperiod: 120 // Check for expired entries every 2 minutes
});

async function getProduct(productId: string): Promise<Product> {
const cached = cache.get<Product>(productId);
if (cached) {
return cached;
}

// Cache miss - fetch from database
const product = await productRepository.findById(productId);

// Store with custom TTL based on product type
const ttl = product.isPromotional ? 60 : 600; // Promotional products expire faster
cache.set(productId, product, ttl);

return product;
}

TTL selection requires understanding your data characteristics:

  • Frequently changing data: Short TTL (seconds to minutes) to minimize staleness
  • Static or slowly changing data: Long TTL (hours to days) to maximize hit rates
  • Real-time requirements: Very short TTL or skip caching entirely
  • Acceptable staleness: Longer TTL within acceptable staleness window

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern where the application code manages cache population. On a cache miss, the application fetches data from the database and writes it to the cache for future requests. This pattern gives you fine-grained control over what gets cached and when.

// Cache-aside pattern in Spring Boot
@Service
public class ProductService {

private final ProductRepository repository;
private final RedisTemplate<String, Product> redisTemplate;
private static final String CACHE_KEY_PREFIX = "product:";
private static final Duration CACHE_TTL = Duration.ofHours(1);

public Product getProduct(String productId) {
String cacheKey = CACHE_KEY_PREFIX + productId;

// Try cache first
Product cached = redisTemplate.opsForValue().get(cacheKey);
if (cached != null) {
return cached;
}

// Cache miss - fetch from database
Product product = repository.findById(productId)
.orElseThrow(() -> new ProductNotFoundException(productId));

// Populate cache for future requests
redisTemplate.opsForValue().set(cacheKey, product, CACHE_TTL);

return product;
}

public void updateProduct(Product product) {
repository.save(product);

// Invalidate cache after update
String cacheKey = CACHE_KEY_PREFIX + product.getId();
redisTemplate.delete(cacheKey);
}
}

Cache-aside is appropriate when:

  • You want explicit control over caching behavior
  • Different data types need different caching strategies
  • Only frequently accessed data should be cached (lazy loading naturally handles this)
  • Cache failures shouldn't cause application failures (cache is treated as enhancement, not requirement)

The main drawback is that every cache miss requires two round trips: one to check the cache and another to fetch from the database. For high-traffic scenarios, this can create thundering herd problems where many requests simultaneously discover a cache miss and all query the database.

Read-Through Caching

Read-through caching moves cache management into a caching library or layer that sits between your application and the database. The application always queries the cache, and the cache layer handles fetching from the database on misses.

// Read-through cache using Spring Cache abstraction
@Service
public class ProductService {

private final ProductRepository repository;

@Cacheable(value = "products", key = "#productId")
public Product getProduct(String productId) {
// Cache abstraction handles cache check and population
// This method only executes on cache miss
return repository.findById(productId)
.orElseThrow(() -> new ProductNotFoundException(productId));
}

@CachePut(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
return repository.save(product);
}

@CacheEvict(value = "products", key = "#productId")
public void deleteProduct(String productId) {
repository.deleteById(productId);
}
}

Read-through caching simplifies application code by abstracting cache management. The cache layer becomes responsible for fetching data on misses, reducing duplication. However, this pattern requires a caching framework that supports read-through semantics (like Spring Cache, JCache, or Hibernate second-level cache).

Write-Through Caching

Write-through caching updates both the cache and the database synchronously on every write. This ensures the cache always contains current data but adds latency to write operations since both operations must complete.

@Service
public class AccountService {

private final AccountRepository repository;
private final RedisTemplate<String, Account> redisTemplate;

public Account updateBalance(String accountId, BigDecimal newBalance) {
Account account = repository.findById(accountId)
.orElseThrow(() -> new AccountNotFoundException(accountId));

account.setBalance(newBalance);

// Write to database first (source of truth)
Account savedAccount = repository.save(account);

// Write to cache to keep it fresh
String cacheKey = "account:" + accountId;
redisTemplate.opsForValue().set(cacheKey, savedAccount, Duration.ofMinutes(30));

return savedAccount;
}
}

Write-through is appropriate when:

  • Read-heavy workloads benefit from always-warm cache
  • Strong consistency between cache and database is required
  • Write latency is acceptable (both database and cache writes must complete)

The primary advantage is cache consistency - the cache never contains stale data. The disadvantage is increased write latency since both storage systems must be updated synchronously. If cache writes fail, you must decide whether to fail the entire operation or accept temporary inconsistency.

Write-Behind (Write-Back) Caching

Write-behind caching updates the cache immediately but queues database updates for asynchronous processing. This provides low write latency but risks data loss if the cache fails before database synchronization completes.

@Service
public class ViewCountService {

private final RedisTemplate<String, Long> redisTemplate;
private final ViewCountRepository repository;
private final ExecutorService asyncExecutor;

public void incrementViewCount(String articleId) {
String cacheKey = "views:" + articleId;

// Immediately update cache
Long newCount = redisTemplate.opsForValue().increment(cacheKey);

// Asynchronously persist to database every 100 views or 5 minutes
if (newCount % 100 == 0) {
asyncExecutor.submit(() -> {
repository.updateViewCount(articleId, newCount);
});
}
}

@Scheduled(fixedDelay = 300000) // Every 5 minutes
public void flushPendingCounts() {
// Batch persist all pending view counts
Set<String> keys = redisTemplate.keys("views:*");
for (String key : keys) {
Long count = redisTemplate.opsForValue().get(key);
String articleId = key.substring("views:".length());
repository.updateViewCount(articleId, count);
}
}
}

Write-behind caching is suitable for:

  • High-frequency writes where write latency is critical (analytics, counters, logs)
  • Data where recent accuracy is acceptable and eventual persistence is sufficient
  • Scenarios where you can batch multiple writes for efficiency

This pattern requires careful consideration of failure scenarios. If the cache crashes before asynchronously persisting data, you lose those updates. Implement proper queue persistence (Redis AOF, persistent message queues) and monitoring to minimize risk.

Event-Based Invalidation

Event-based invalidation uses pub/sub messaging or event streams to notify caches when underlying data changes. This provides more precise invalidation than TTL-based approaches and can maintain stronger consistency across distributed caches.

// Publisher: Invalidate cache when data changes
@Service
public class UserService {

private final UserRepository repository;
private final RedisTemplate<String, String> redisTemplate;
private final ApplicationEventPublisher eventPublisher;

public User updateUser(User user) {
User updated = repository.save(user);

// Publish invalidation event
eventPublisher.publishEvent(new UserUpdatedEvent(user.getId()));

return updated;
}
}

// Subscriber: Listen for invalidation events
@Component
public class CacheInvalidationListener {

private final Cache<String, User> userCache;

@EventListener
public void handleUserUpdate(UserUpdatedEvent event) {
// Invalidate local cache
userCache.invalidate(event.getUserId());
}
}

// Using Redis Pub/Sub for distributed invalidation
@Service
public class DistributedCacheInvalidation {

private final RedisTemplate<String, String> redisTemplate;
private final Cache<String, User> localCache;

public void invalidateUserCache(String userId) {
// Invalidate local cache
localCache.invalidate(userId);

// Publish to all instances via Redis pub/sub
redisTemplate.convertAndSend("cache:invalidate:user", userId);
}

@RedisMessageListener(topic = "cache:invalidate:user")
public void onInvalidationMessage(String userId) {
localCache.invalidate(userId);
}
}

Event-based invalidation provides precise control over cache consistency but adds complexity. You must handle message delivery failures, duplicate messages, and ensure all cache layers subscribe to invalidation events. This pattern works well for critical data where TTL-based expiration is too coarse.

Distributed Caching

When applications scale horizontally across multiple instances, in-memory caching becomes insufficient since each instance maintains its own cache. Distributed caches provide shared storage accessible to all application instances, enabling cache sharing and coordination.

Redis

Redis is an in-memory data structure store commonly used as a distributed cache. It provides rich data types (strings, hashes, lists, sets, sorted sets), atomic operations, pub/sub messaging, and persistence options.

// Spring Boot Redis configuration
@Configuration
@EnableCaching
public class RedisCacheConfig {

@Bean
public RedisCacheConfiguration cacheConfiguration() {
return RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(10))
.serializeKeysWith(
RedisSerializationContext.SerializationPair
.fromSerializer(new StringRedisSerializer())
)
.serializeValuesWith(
RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer())
)
.disableCachingNullValues();
}

@Bean
public RedisCacheManager cacheManager(RedisConnectionFactory connectionFactory) {
return RedisCacheManager.builder(connectionFactory)
.cacheDefaults(cacheConfiguration())
.withInitialCacheConfigurations(Map.of(
"users", cacheConfiguration().entryTtl(Duration.ofHours(1)),
"sessions", cacheConfiguration().entryTtl(Duration.ofMinutes(30)),
"products", cacheConfiguration().entryTtl(Duration.ofDays(1))
))
.build();
}
}

Redis Advantages:

  • Rich data structures beyond simple key-value (lists, sets, sorted sets, hashes)
  • Atomic operations prevent race conditions
  • Pub/sub messaging enables event-based invalidation
  • Optional persistence (RDB snapshots, AOF logging) provides durability
  • Clustering and replication support high availability

Redis Considerations:

  • Network latency (slower than in-memory but much faster than database)
  • Memory limits require eviction policies
  • Single-threaded core (use connection pooling for concurrency)
  • Serialization overhead (choose efficient formats like MessagePack or Protobuf for large objects)

Memcached

Memcached is a simple, high-performance distributed memory cache designed for speed and simplicity. It provides basic key-value storage with automatic expiration and LRU eviction.

// Node.js Memcached example
import Memcached from 'memcached';

const memcached = new Memcached('localhost:11211', {
retries: 3,
timeout: 500,
reconnect: 1000
});

async function getCachedData(key: string): Promise<any> {
return new Promise((resolve, reject) => {
memcached.get(key, (err, data) => {
if (err) return reject(err);
resolve(data);
});
});
}

async function setCachedData(key: string, value: any, ttl: number = 600): Promise<void> {
return new Promise((resolve, reject) => {
memcached.set(key, value, ttl, (err) => {
if (err) return reject(err);
resolve();
});
});
}

// Usage in a service
async function getUser(userId: string): Promise<User> {
const cacheKey = `user:${userId}`;

const cached = await getCachedData(cacheKey);
if (cached) {
return cached;
}

const user = await db.users.findById(userId);
await setCachedData(cacheKey, user, 3600); // Cache for 1 hour

return user;
}

Memcached Advantages:

  • Extremely fast (optimized for simple get/set operations)
  • Simple protocol and minimal overhead
  • Multi-threaded architecture for high concurrency
  • Automatic memory management with LRU eviction

Memcached Limitations:

  • No persistence (purely in-memory, data lost on restart)
  • Limited to simple key-value storage (no complex data structures)
  • No built-in pub/sub or event notifications
  • Maximum value size (1MB by default)

Choosing Between Redis and Memcached:

  • Use Redis when you need complex data structures, persistence, pub/sub, or atomic operations
  • Use Memcached when you need the absolute fastest simple key-value caching
  • Use Redis for most modern applications due to its richer feature set

Hazelcast

Hazelcast is a distributed in-memory data grid that provides distributed caching with strong consistency guarantees and embedded (in-process) deployment options.

// Hazelcast embedded cache configuration
@Configuration
public class HazelcastConfig {

@Bean
public Config hazelcastConfig() {
Config config = new Config();

// Configure distributed map (cache)
config.addMapConfig(
new MapConfig("users")
.setTimeToLiveSeconds(600)
.setMaxIdleSeconds(300)
.setEvictionConfig(
new EvictionConfig()
.setEvictionPolicy(EvictionPolicy.LRU)
.setMaxSizePolicy(MaxSizePolicy.PER_NODE)
.setSize(10000)
)
.setBackupCount(1) // One backup for high availability
);

return config;
}

@Bean
public HazelcastInstance hazelcastInstance(Config config) {
return Hazelcast.newHazelcastInstance(config);
}
}

// Using Hazelcast for distributed caching
@Service
public class UserService {

private final HazelcastInstance hazelcast;
private final IMap<String, User> userCache;

public UserService(HazelcastInstance hazelcast) {
this.hazelcast = hazelcast;
this.userCache = hazelcast.getMap("users");
}

public User getUser(String userId) {
return userCache.computeIfAbsent(userId, key -> {
// Cache miss - fetch from database
return userRepository.findById(key)
.orElseThrow(() -> new UserNotFoundException(key));
});
}
}

Hazelcast provides unique features like near-cache (local copy with distributed synchronization), distributed locks, and in-memory compute. It's particularly useful for applications requiring strong consistency or complex distributed coordination.

HTTP Caching Headers

HTTP caching headers control how browsers, CDNs, and proxies cache HTTP responses. Understanding these headers is essential for effective client-side and CDN caching.

Cache-Control

The Cache-Control header provides comprehensive control over caching behavior. It replaces older headers like Expires and Pragma.

// Spring Boot example setting Cache-Control headers
@RestController
@RequestMapping("/api")
public class ProductController {

@GetMapping("/products/{id}")
public ResponseEntity<Product> getProduct(@PathVariable String id) {
Product product = productService.getProduct(id);

return ResponseEntity.ok()
.cacheControl(CacheControl
.maxAge(5, TimeUnit.MINUTES)
.cachePublic() // Allows CDN/shared caches
.mustRevalidate() // Must check with origin after expiration
)
.eTag(product.getVersion()) // Enable conditional requests
.body(product);
}

@GetMapping("/users/me")
public ResponseEntity<User> getCurrentUser(Principal principal) {
User user = userService.getUser(principal.getName());

return ResponseEntity.ok()
.cacheControl(CacheControl
.maxAge(1, TimeUnit.MINUTES)
.cachePrivate() // Only browser cache, not CDN
.noStore() // Sensitive data - don't persist to disk
)
.body(user);
}

@GetMapping("/products/search")
public ResponseEntity<List<Product>> searchProducts(@RequestParam String query) {
List<Product> results = productService.search(query);

return ResponseEntity.ok()
.cacheControl(CacheControl.noCache()) // Always revalidate
.body(results);
}
}

Common Cache-Control directives:

  • public: Response can be cached by any cache (browser, CDN, proxy)
  • private: Only browser cache, not shared caches (for user-specific content)
  • no-cache: Must revalidate with origin before using cached response
  • no-store: Don't cache at all (for sensitive data)
  • max-age=N: Cache is valid for N seconds
  • s-maxage=N: Override max-age for shared caches (CDN) only
  • must-revalidate: Must not use stale cache after expiration
  • immutable: Content never changes (perfect for versioned assets)

ETag and Conditional Requests

ETags (entity tags) enable conditional requests where the client asks "has this resource changed since my last request?" If unchanged, the server responds with 304 Not Modified, saving bandwidth.

// Express.js example with ETag support
import express from 'express';
import crypto from 'crypto';

const app = express();

function generateETag(data: any): string {
const hash = crypto.createHash('md5');
hash.update(JSON.stringify(data));
return `"${hash.digest('hex')}"`;
}

app.get('/api/products/:id', async (req, res) => {
const product = await productService.getProduct(req.params.id);

const etag = generateETag(product);
const clientETag = req.headers['if-none-match'];

// Client has current version
if (clientETag === etag) {
return res.status(304).end();
}

// Send full response with ETag
res.set('ETag', etag);
res.set('Cache-Control', 'public, max-age=300');
res.json(product);
});

ETags work well for dynamic content where exact expiration is hard to predict. The client caches the response with its ETag, then sends the ETag in subsequent requests. If content hasn't changed, the server responds with 304 Not Modified, avoiding response body transmission.

Last-Modified and Conditional Requests

Similar to ETags, Last-Modified enables conditional requests based on modification time. The client sends If-Modified-Since with cached response timestamp.

@GetMapping("/api/documents/{id}")
public ResponseEntity<Document> getDocument(@PathVariable String id,
@RequestHeader(value = "If-Modified-Since", required = false)
@DateTimeFormat(iso = DateTimeFormat.ISO.DATE_TIME)
ZonedDateTime ifModifiedSince) {

Document document = documentService.getDocument(id);
ZonedDateTime lastModified = document.getUpdatedAt();

// Return 304 if not modified since client's last request
if (ifModifiedSince != null && !lastModified.isAfter(ifModifiedSince)) {
return ResponseEntity.status(HttpStatus.NOT_MODIFIED)
.lastModified(lastModified.toInstant())
.build();
}

// Return full document with Last-Modified header
return ResponseEntity.ok()
.lastModified(lastModified.toInstant())
.cacheControl(CacheControl.maxAge(10, TimeUnit.MINUTES))
.body(document);
}

Last-Modified is simpler than ETags but less precise (second-level granularity). Use Last-Modified when resources have clear modification timestamps. Use ETags when modification time is unavailable or when second-level precision is insufficient.

Expires Header

The Expires header sets an absolute expiration date/time for cached responses. Cache-Control max-age is preferred because it's relative and doesn't require synchronized clocks, but Expires provides backwards compatibility.

@GetMapping("/api/static-config")
public ResponseEntity<Configuration> getConfig() {
Configuration config = configService.getConfig();

ZonedDateTime expires = ZonedDateTime.now().plusHours(24);

return ResponseEntity.ok()
.expires(expires.toInstant().toEpochMilli())
.cacheControl(CacheControl.maxAge(24, TimeUnit.HOURS)) // Preferred
.body(config);
}

Cache Warming and Pre-fetching

Cache warming populates the cache before requests arrive, avoiding cold start performance degradation. After deployments or cache invalidations, an empty cache causes slow response times until it repopulates through normal traffic.

@Component
public class CacheWarmer {

private final ProductService productService;
private final CategoryService categoryService;

@EventListener(ApplicationReadyEvent.class)
public void warmCacheOnStartup() {
log.info("Starting cache warming...");

// Warm product cache with popular items
List<String> popularProductIds = analyticsService.getPopularProducts(100);
popularProductIds.parallelStream()
.forEach(id -> {
try {
productService.getProduct(id); // Populates cache
} catch (Exception e) {
log.warn("Failed to warm cache for product {}", id, e);
}
});

// Warm category tree (frequently accessed)
categoryService.getCategoryTree();

log.info("Cache warming complete");
}

@Scheduled(cron = "0 0 */6 * * *") // Every 6 hours
public void refreshPopularProducts() {
// Periodically refresh popular items before they expire
List<String> popularProductIds = analyticsService.getPopularProducts(100);
popularProductIds.forEach(id -> {
productService.refreshProduct(id); // Updates cache
});
}
}

Cache warming strategies:

Startup warming: Pre-load critical data when the application starts, preventing initial request slowness. Use this for navigation menus, configuration data, and most-accessed items.

Scheduled warming: Refresh cache entries before they expire to maintain high hit rates. This works well for slowly-changing data that's expensive to compute.

Predictive pre-fetching: Anticipate what users will request next and pre-fetch it. For example, when a user views a product list, pre-fetch the first few product details. This improves perceived performance but wastes cache space if predictions are wrong.

Batch warming: After large data imports or batch operations, warm affected cache entries to prevent thundering herd when users request updated data.

Cache Stampede Prevention

Cache stampede (thundering herd) occurs when many requests simultaneously discover a cache miss for the same key and all query the backing store. This can overwhelm databases and cause cascading failures.

Request Coalescing

Request coalescing ensures only one request fetches data while others wait for the result. This prevents duplicate database queries for the same key.

@Service
public class ProductService {

private final Cache<String, CompletableFuture<Product>> inflightRequests =
Caffeine.newBuilder()
.expireAfterWrite(Duration.ofSeconds(30))
.build();

private final Cache<String, Product> productCache;
private final ProductRepository repository;

public CompletableFuture<Product> getProduct(String productId) {
// Check cache first
Product cached = productCache.getIfPresent(productId);
if (cached != null) {
return CompletableFuture.completedFuture(cached);
}

// Coalesce concurrent requests for the same key
return inflightRequests.get(productId, key -> {
return CompletableFuture.supplyAsync(() -> {
// Only one request per key executes this block
Product product = repository.findById(key)
.orElseThrow(() -> new ProductNotFoundException(key));

// Populate cache for future requests
productCache.put(key, product);

// Remove from inflight tracking
inflightRequests.invalidate(key);

return product;
});
});
}
}

Request coalescing dramatically reduces database load during cache stampedes by ensuring only one request per key reaches the database, regardless of concurrent request count.

Probabilistic Early Expiration

Probabilistic early expiration randomly refreshes cache entries slightly before their actual expiration to spread refresh load over time instead of all at once.

public class ProbabilisticCache<K, V> {

private final Cache<K, CacheEntry<V>> cache;
private final Random random = new Random();

public V get(K key, Function<K, V> loader, Duration ttl) {
CacheEntry<V> entry = cache.getIfPresent(key);

if (entry != null) {
long age = System.currentTimeMillis() - entry.createdAt;
long ttlMillis = ttl.toMillis();

// Probability of early refresh increases as entry ages
// Formula: age / (ttl * (1 - beta * random))
// Where beta controls randomness (typically 0.1 to 0.2)
double beta = 0.1;
double threshold = ttlMillis * (1 - beta * random.nextDouble());

if (age > threshold) {
// Probabilistically refresh before expiration
V newValue = loader.apply(key);
cache.put(key, new CacheEntry<>(newValue));
return newValue;
}

return entry.value;
}

// Cache miss - load and store
V value = loader.apply(key);
cache.put(key, new CacheEntry<>(value));
return value;
}

private static class CacheEntry<V> {
final V value;
final long createdAt;

CacheEntry(V value) {
this.value = value;
this.createdAt = System.currentTimeMillis();
}
}
}

This approach spreads cache refreshes over time, preventing synchronized expiration of many entries and subsequent stampede.

Locking with Expiration

Distributed locks ensure only one process refreshes an expired cache entry while others wait or return stale data. This requires a distributed lock mechanism (Redis, Zookeeper, database locks).

@Service
public class CachedDataService {

private final RedissonClient redisson;
private final DataRepository repository;
private final Cache<String, Data> localCache;

public Data getData(String key) {
// Check local cache
Data cached = localCache.getIfPresent(key);
if (cached != null && !isExpired(cached)) {
return cached;
}

// Try to acquire lock for refresh
RLock lock = redisson.getLock("cache:refresh:" + key);
boolean acquired = false;

try {
acquired = lock.tryLock(100, TimeUnit.MILLISECONDS);

if (acquired) {
// This request refreshes the cache
Data data = repository.findById(key);
localCache.put(key, data);
return data;
} else {
// Another request is refreshing, return stale or wait
if (cached != null) {
// Serve stale data while refresh happens
return cached;
} else {
// No stale data, must wait for refresh
Thread.sleep(50);
return getData(key); // Retry after brief wait
}
}
} finally {
if (acquired) {
lock.unlock();
}
}
}
}

Eviction Policies

When caches reach memory capacity, eviction policies determine which entries to remove. The choice of eviction policy significantly impacts cache effectiveness for different access patterns.

Least Recently Used (LRU)

LRU evicts the entry that hasn't been accessed for the longest time. This works well for many access patterns because recently accessed data is likely to be accessed again soon (temporal locality).

// Caffeine uses Window TinyLFU by default, but can be configured for LRU-like behavior
Cache<String, Product> lruCache = Caffeine.newBuilder()
.maximumSize(10_000)
.build();

LRU performs well when:

  • Access patterns exhibit temporal locality (recently accessed items are accessed again soon)
  • Working set fits mostly in cache
  • Scanning workloads don't pollute the cache too badly

LRU performs poorly when:

  • Access patterns are cyclic and exceed cache size (everything becomes "least recently used")
  • Scanning large datasets pushes out frequently accessed data

Least Frequently Used (LFU)

LFU evicts the entry with the lowest access frequency. This retains popular items even if they haven't been accessed very recently.

// Caffeine's Window TinyLFU combines recency and frequency
Cache<String, Product> frequencyAwareCache = Caffeine.newBuilder()
.maximumSize(10_000)
.recordStats()
.build();

LFU advantages:

  • Retains frequently accessed items regardless of temporal gaps
  • Less susceptible to scanning workloads
  • Better for access patterns with clear "popular" items

LFU disadvantages:

  • Historical frequency can become stale (item accessed 1000 times last month but never since)
  • New items struggle to gain frequency count
  • Requires more metadata (access counts)

Modern implementations like Window TinyLFU combine LRU and LFU advantages while mitigating disadvantages.

First-In-First-Out (FIFO)

FIFO evicts the oldest entry regardless of access patterns. This is simpler to implement but generally performs worse than LRU or LFU.

FIFO is appropriate only when:

  • Cache entries have similar access probabilities
  • Implementation simplicity is more important than hit rate
  • Access patterns are truly random

Random Eviction

Random eviction selects a random entry for removal. This is the simplest policy and surprisingly effective for certain workloads, especially when access patterns are random.

Random eviction:

  • Performs better than LRU for cyclic scans
  • Simpler implementation (no access tracking)
  • Unpredictable performance for specific keys
  • Generally worse than LRU for most real-world workloads

Time-To-Live (TTL) Expiration

TTL-based eviction removes entries after a fixed time period, regardless of space constraints. This is often combined with size-based eviction policies.

Cache<String, Product> ttlCache = Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10)) // Absolute expiration
.expireAfterAccess(Duration.ofMinutes(5)) // Idle expiration
.maximumSize(10_000) // Also enforce size limit
.build();

TTL expiration is essential when:

  • Data becomes stale after a known time period
  • Compliance requires removing data after specific duration
  • Memory constraints require removing old entries even if space available

Monitoring Cache Performance

Effective caching requires continuous monitoring to understand cache behavior, identify problems, and optimize configuration. Key metrics reveal cache effectiveness and guide tuning decisions.

Essential Cache Metrics

@Component
public class CacheMetrics {

private final MeterRegistry registry;
private final Cache<String, Product> productCache;

@Scheduled(fixedDelay = 60000) // Every minute
public void recordCacheMetrics() {
CacheStats stats = productCache.stats();

// Hit rate: percentage of requests served from cache
registry.gauge("cache.hit.rate", stats.hitRate());

// Miss rate: percentage of requests requiring fetch
registry.gauge("cache.miss.rate", stats.missRate());

// Load success/failure rates
registry.gauge("cache.load.success.rate", stats.loadSuccessCount() /
(double) stats.loadCount());

// Average load time (microseconds)
registry.gauge("cache.load.average.millis",
stats.averageLoadPenalty() / 1_000_000);

// Eviction count
registry.gauge("cache.eviction.count", stats.evictionCount());

// Estimated cache size
registry.gauge("cache.size", productCache.estimatedSize());
}
}

Cache Hit Rate: The percentage of requests served from cache without fetching from backing store. Higher is better - aim for 80%+ for effective caching. Low hit rates suggest poor TTL configuration, insufficient cache size, or inappropriate caching strategy.

Miss Rate: The inverse of hit rate. High miss rates indicate cache ineffectiveness. Investigate whether the cache is too small, TTL too short, or if access patterns don't benefit from caching.

Eviction Rate: How frequently entries are evicted due to size constraints. High eviction rates suggest insufficient cache size relative to working set. Consider increasing cache size or reducing TTL to keep fewer items in cache.

Average Load Time: How long it takes to fetch data on cache misses. This helps quantify the performance benefit of caching. If load times are low, caching may not provide significant value.

Memory Usage: Track cache memory consumption to prevent out-of-memory errors and understand the memory/performance trade-off.

Monitoring Distributed Caches

// Redis monitoring example using ioredis
import Redis from 'ioredis';
import { register, Gauge, Counter } from 'prom-client';

const redis = new Redis();

const redisHits = new Counter({
name: 'redis_cache_hits_total',
help: 'Total number of cache hits'
});

const redisMisses = new Counter({
name: 'redis_cache_misses_total',
help: 'Total number of cache misses'
});

const redisMemoryUsage = new Gauge({
name: 'redis_memory_usage_bytes',
help: 'Redis memory usage in bytes'
});

// Wrapper that tracks metrics
async function getCached(key: string): Promise<any> {
const value = await redis.get(key);

if (value) {
redisHits.inc();
return JSON.parse(value);
}

redisMisses.inc();
return null;
}

// Periodically collect Redis INFO metrics
setInterval(async () => {
const info = await redis.info('memory');
const memoryMatch = info.match(/used_memory:(\d+)/);
if (memoryMatch) {
redisMemoryUsage.set(parseInt(memoryMatch[1]));
}
}, 60000);

Monitor distributed cache health:

  • Connection pool utilization: Ensure sufficient connections for concurrent requests
  • Network latency: Track round-trip time to cache server
  • Replication lag: For replicated caches, monitor lag between primary and replicas
  • Memory usage: Track memory consumption and eviction rates
  • Slow commands: Identify operations taking longer than expected

Alerting on Cache Issues

# Prometheus alerting rules for cache metrics
groups:
- name: cache_alerts
rules:
- alert: LowCacheHitRate
expr: cache_hit_rate < 0.7
for: 10m
annotations:
summary: "Cache hit rate below 70%"
description: "Cache hit rate is {{ $value }} which may indicate sizing issues"

- alert: HighCacheEvictionRate
expr: rate(cache_eviction_count[5m]) > 100
for: 5m
annotations:
summary: "High cache eviction rate"
description: "Evicting {{ $value }} entries per second, consider increasing cache size"

- alert: RedisHighMemoryUsage
expr: redis_memory_usage_bytes / redis_maxmemory_bytes > 0.9
for: 5m
annotations:
summary: "Redis memory usage above 90%"
description: "Redis using {{ $value }}% of available memory"

Set up alerts for:

  • Hit rate drops below acceptable threshold
  • Eviction rate spikes (indicates insufficient cache size)
  • Cache server memory approaching capacity
  • Cache server connection failures
  • Abnormally high cache load times

Common Caching Anti-Patterns

Caching Everything

Anti-pattern: Caching every database query or API response regardless of access patterns.

Problem: Wasting memory on rarely accessed data, increasing eviction rates for frequently accessed data, and adding complexity without performance benefit.

Solution: Cache based on data analysis - identify frequently accessed, expensive-to-compute data. Monitor hit rates per cache key pattern to identify ineffective caching.

Over-Caching User-Specific Data

Anti-pattern: Caching highly personalized data with low reuse across users.

Problem: Each user gets separate cache entries, fragmenting cache space and reducing hit rates. For example, caching user-specific dashboards that are unique per user provides no benefit.

Solution: Cache shared data (product catalog, categories, configuration) and compute user-specific data on each request. Consider partial caching where shared components are cached and personalized elements computed.

Ignoring Cache Invalidation

Anti-pattern: Setting long TTLs without implementing invalidation when data changes.

Problem: Serving stale data that frustrates users and creates business problems (showing wrong prices, outdated inventory).

Solution: Implement explicit cache invalidation on writes. Use shorter TTLs for frequently changing data. Consider event-based invalidation for critical data.

Cache Stampede Ignorance

Anti-pattern: Not protecting against cache stampedes when popular entries expire.

Problem: Database overload when hundreds of concurrent requests simultaneously discover a cache miss for a popular key.

Solution: Implement request coalescing, probabilistic early expiration, or distributed locks to prevent stampedes.

Caching Failures

Anti-pattern: Caching error responses or null values.

Problem: When the database is temporarily unavailable and queries fail, caching the failure means subsequent requests serve errors even after the database recovers.

Solution: Never cache errors or null values. Configure cache libraries to ignore null returns. Implement health checks and circuit breakers for backing services.

Inadequate Cache Sizing

Anti-pattern: Not analyzing working set size or monitoring eviction rates.

Problem: Cache too small results in constant evictions and low hit rates. Cache too large wastes memory and risks out-of-memory errors.

Solution: Analyze access patterns to determine working set size (amount of frequently accessed data). Size cache to hold working set plus buffer. Monitor eviction rates and adjust accordingly.

Effective caching integrates with many other system aspects:

  • Performance Optimization: Caching is one part of comprehensive performance strategy including database indexing, query optimization, and CDN usage
  • API Design: Design APIs with caching in mind - include cache headers, support conditional requests, avoid over-personalization
  • Observability: Monitor cache metrics alongside application metrics to understand overall system performance
  • Spring Boot: Spring Cache abstraction simplifies application-level caching with annotations and multiple backend support
  • React Performance: Client-side caching with React Query, SWR, and browser caching strategies
  • Database Best Practices: Database query optimization and indexing reduce the cost of cache misses
  • Rate Limiting: Caching reduces load that might trigger rate limits; consider cache stampede impact on rate limiting