Performance Optimization Best Practices

Evidence-Based Optimization

Applications must remain responsive under load. Optimize database queries, implement multi-level caching, use connection pooling, and profile before optimizing. Measure impact of every optimization to ensure it addresses actual bottlenecks. Target P95 latency thresholds based on your specific SLAs.

Overview

Performance optimization is a systematic process of identifying and eliminating bottlenecks that degrade application responsiveness, throughput, or resource efficiency. This guide covers profiling techniques, caching strategies (both in-memory and distributed), connection pooling, database optimization, asynchronous processing, and JVM tuning.

Effective performance optimization requires understanding where time is spent in your application. Most applications exhibit the Pareto principle: 80% of performance issues stem from 20% of the code. The key is identifying that critical 20% through measurement rather than intuition.

Core Principles

Measure First: Profile before optimizing to avoid premature optimization - the root of many wasted efforts. Donald Knuth's famous observation that "premature optimization is the root of all evil" emphasizes that optimization without measurement often optimizes the wrong things. Use profiling tools to identify actual bottlenecks rather than perceived ones.
Multi-Level Caching: Implement layered caching (L1 in-memory, L2 distributed, L3 database query cache) to reduce latency at each tier. Each cache level trades off speed for capacity - understanding these tradeoffs is essential for effective cache design.
Connection Pooling: Reuse expensive resources (database connections, HTTP clients) to eliminate connection establishment overhead. Creating new connections involves TCP handshakes, TLS negotiation, and authentication - overhead that can dominate request latency.
Asynchronous Processing: Use virtual threads or async patterns for I/O-bound operations to maximize resource utilization. Blocking threads waste CPU cycles - async processing allows threads to handle other work while waiting for I/O.
Database Optimization: Proper indexes, query optimization, and batching transform database interaction from a bottleneck to an enabler. Understanding query execution plans is fundamental to database performance.
JVM Tuning: Garbage collection tuning, heap sizing, and thread pool configuration affect both throughput and latency. The JVM's automatic memory management is powerful but requires configuration for optimal performance under load.

Profiling and Measurement

Profiling reveals where your application spends time and resources. Without profiling, optimization is guesswork. Modern profiling tools provide detailed insights into CPU usage, memory allocation, thread behavior, and I/O wait times.

The goal of profiling is not just to find the slowest code but to find the code that runs slowly AND frequently. A method that takes 100ms but is called once per minute is less critical than one that takes 10ms but is called 1000 times per second.

Spring Boot Actuator Metrics

Spring Boot Actuator exposes runtime metrics through HTTP endpoints, providing production-safe insights into application behavior. Unlike traditional profilers that can impact performance, Actuator metrics are designed for continuous production monitoring with minimal overhead.

The configuration below enables percentile histograms, which provide distribution metrics like P50, P95, and P99 latency. These percentiles reveal performance characteristics better than simple averages - P95 latency shows the experience of your slowest 5% of requests, often indicating capacity or resource contention issues. For more on production observability, see Spring Boot Observability.

# application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  metrics:
    distribution:
      percentiles-histogram:
        http.server.requests: true
      percentiles:
        http.server.requests: 0.5, 0.95, 0.99

Profiling with JFR (Java Flight Recorder)

Java Flight Recorder (JFR) is a low-overhead profiling tool built into the JVM. Unlike sampling profilers that periodically check stack traces, JFR records events as they occur, providing precise timing information without the skew introduced by sampling.

JFR captures CPU usage, memory allocation, garbage collection events, lock contention, and I/O operations. The key advantage is its production-safe design - JFR typically adds less than 1% overhead, making it suitable for continuous profiling in production environments. Analyze JFR recordings with JDK Mission Control (JMC) to visualize hotspots, memory allocation patterns, and thread behavior.

# Start application with JFR
java -XX:StartFlightRecording=duration=60s,filename=myrecording.jfr \
     -jar payment-service.jar

# Analyze with JDK Mission Control
jmc myrecording.jfr

Custom Performance Logging

While dedicated profiling tools provide comprehensive analysis, targeted performance logging helps identify slow operations in production. This lightweight approach logs execution time for operations exceeding a threshold, providing early warning of performance degradation.

The example below uses Spring's StopWatch utility to measure execution time. This pattern is useful for tracking specific business operations where detailed profiling would be overkill. Consider logging additional context (user ID, request parameters) to correlate slow operations with specific conditions.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.StopWatch;

@Service
public class PaymentService {
    private static final Logger log = LoggerFactory.getLogger(PaymentService.class);

    public PaymentResult processPayment(Payment payment) {
        StopWatch watch = new StopWatch();
        watch.start();

        try {
            PaymentResult result = executePayment(payment);

            watch.stop();
            long executionTime = watch.getTotalTimeMillis();

            // Log slow operations with contextual information
            if (executionTime > 200) {
                log.warn("Slow payment processing detected: {}ms for payment {}",
                    executionTime, payment.getId());
            }

            return result;

        } catch (Exception e) {
            watch.stop();
            throw e;
        }
    }
}

Caching Strategies

Caching is one of the most effective performance optimizations, reducing latency by storing computed or fetched data closer to where it's needed. A well-designed caching strategy can reduce response times from hundreds of milliseconds to sub-millisecond levels.

For comprehensive caching guidance including multi-level caching architecture (L1/L2/L3), Caffeine configuration, Redis setup, cache invalidation patterns, and Spring Cache annotations, see the Caching Guide.

Key caching decisions for performance:

When to cache: High-read, low-write data with tolerance for eventual consistency
Cache invalidation: Choose between TTL-based expiration and explicit invalidation based on consistency requirements
Cache sizing: Balance memory consumption against hit rates - monitor miss rates to optimize

Connection Pooling

Connection pooling is one of the most impactful performance optimizations for I/O-bound applications. Establishing a new database connection involves multiple network round trips: TCP handshake (3-way), TLS negotiation (2-4 round trips), and authentication. This overhead can easily exceed 50-100ms - often longer than the query itself.

How connection pooling works: Instead of creating a new connection for each database operation, the application borrows a connection from a pre-established pool, uses it, and returns it. This eliminates connection establishment overhead for all but the initial pool warm-up.

Pool sizing: More connections ≠ better performance. Oversized pools exhaust database resources and increase context switching. A common formula is: pool_size = ((core_count * 2) + effective_spindle_count). For modern SSDs, effective_spindle_count is typically 1. This formula derives from analysis of how many threads can productively use the database simultaneously - more connections than this create contention, not concurrency. See HikariCP's pool sizing documentation for the mathematical backing.

Connection validation: Idle connections can be closed by firewalls or database servers. Connection pools validate connections before handing them to application code, preventing query failures on dead connections.

HikariCP Database Connection Pool

HikariCP is the default connection pool in Spring Boot and currently the fastest available for the JVM. Its performance advantages come from careful attention to CPU cache-line alignment, minimal synchronization, and optimized bytecode.

The configuration below establishes a pool of 10-20 connections. maximum-pool-size caps the total connections - set this based on database capacity and expected concurrency. minimum-idle maintains ready connections to handle load spikes. max-lifetime rotates connections before databases forcibly close them. leak-detection-threshold alerts when code fails to return connections to the pool. For more database optimization, see Data Access Guidelines.

# application.yml
spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/payments
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
    hikari:
      maximum-pool-size: 20        # Total pool capacity
      minimum-idle: 10              # Ready connections for immediate use
      connection-timeout: 30000     # Wait time for connection from pool (30s)
      idle-timeout: 600000          # Close idle connections after 10 minutes
      max-lifetime: 1800000         # Rotate connections every 30 minutes
      leak-detection-threshold: 60000  # Warn if connection held >1 minute
      pool-name: PaymentServicePool

HTTP Client Connection Pool

HTTP client connection pooling follows similar principles to database pooling. Without pooling, each HTTP request establishes a new TCP connection (and TLS session for HTTPS), adding 50-200ms latency. Connection pooling reuses established connections, reducing overhead to near zero.

The configuration below uses Apache HttpClient 5 with a connection pool. maxTotal limits total connections across all routes (hosts). defaultMaxPerRoute limits connections per host - set this based on the remote server's capacity. Reusing connections also benefits the remote server by reducing its connection handling overhead.

import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.client.HttpComponentsClientHttpRequestFactory;
import org.springframework.web.client.RestTemplate;

@Configuration
public class RestTemplateConfig {

    @Bean
    public RestTemplate restTemplate() {
        PoolingHttpClientConnectionManager connectionManager =
            new PoolingHttpClientConnectionManager();
        connectionManager.setMaxTotal(100);
        connectionManager.setDefaultMaxPerRoute(20);

        CloseableHttpClient httpClient = HttpClients.custom()
            .setConnectionManager(connectionManager)
            .build();

        HttpComponentsClientHttpRequestFactory factory =
            new HttpComponentsClientHttpRequestFactory(httpClient);
        factory.setConnectTimeout(5000);
        factory.setConnectionRequestTimeout(5000);

        return new RestTemplate(factory);
    }
}

Database Optimization

Database queries often dominate application latency. A poorly optimized query can take seconds when it should take milliseconds. The difference usually comes down to three factors: indexes, query structure, and data volume.

Why databases are slow without indexes: Without indexes, databases perform full table scans - reading every row to find matches. For a table with 1 million rows, this might require reading hundreds of MB from disk. Indexes provide direct access to matching rows, similar to how a book's index lets you jump to relevant pages instead of reading every page.

The index tradeoff: Indexes speed reads but slow writes. Each index must be updated when data changes. This tradeoff means you should index based on your read/write patterns - high-read applications benefit from aggressive indexing, while write-heavy applications require more selective indexing.

Index Strategy

Effective indexing requires understanding your query patterns. The most frequently executed queries should dictate your index strategy. Use database query logs or application metrics to identify hot queries, then analyze their execution plans to determine if indexes would help.

Column order in composite indexes matters: For an index on (user_id, status, created_at), queries filtering on user_id alone OR user_id + status can use the index. But queries filtering only on status cannot - indexes are traversed left-to-right. This is analogous to a phone book sorted by last name, then first name: you can efficiently search by last name alone, but searching by first name requires scanning the entire book.

Partial indexes: When queries consistently filter on specific values (e.g., status = 'PENDING'), partial indexes provide the benefits of indexing with lower storage and maintenance overhead. They only index rows matching the WHERE clause.

-- Index for frequent single-column lookups
CREATE INDEX idx_payments_user_id ON payments(user_id);
CREATE INDEX idx_payments_created_at ON payments(created_at);

-- Composite index for common filter combinations
-- Order matters: put most selective columns first
CREATE INDEX idx_payments_user_status_created
ON payments(user_id, status, created_at DESC);

-- Partial index for frequently queried subset
-- Reduces index size and update overhead
CREATE INDEX idx_active_payments
ON payments(user_id, created_at)
WHERE status = 'PENDING';

-- Monitor index usage to identify unused indexes
-- Low idx_scan counts indicate unused indexes consuming resources
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,              -- Number of index scans
    idx_tup_read,          -- Tuples read from index
    idx_tup_fetch          -- Tuples fetched using index
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;    -- Unused indexes appear first

Query Optimization

Query optimization involves selecting only needed data, avoiding N+1 problems, and leveraging database features like fetch joins. The ORM (Hibernate/JPA) adds a layer of abstraction that can hide performance problems - always validate that your ORM generates efficient SQL.

The N+1 problem: When fetching a collection of entities with related data, naive approaches execute 1 query for the collection + N queries for related entities. With 100 payments, this becomes 101 queries instead of 1-2. Fetch joins solve this by using SQL JOINs to retrieve everything in a single query. For details on JPA optimization, see Spring Boot Data Guidelines.

import org.springframework.data.jpa.repository.Query;
import org.springframework.data.jpa.repository.QueryHints;
import jakarta.persistence.QueryHint;

public interface PaymentRepository extends JpaRepository<Payment, String> {

    // GOOD: Explicit column selection
    @Query("SELECT p.id, p.amount, p.currency, p.status " +
           "FROM Payment p WHERE p.userId = :userId")
    List<PaymentProjection> findPaymentsByUserId(@Param("userId") String userId);

    // BAD: Select all columns (N+1 problem)
    List<Payment> findByUserId(String userId);

    // GOOD: Fetch join to avoid N+1
    @Query("SELECT p FROM Payment p " +
           "LEFT JOIN FETCH p.transactions " +
           "WHERE p.userId = :userId")
    List<Payment> findPaymentsWithTransactions(@Param("userId") String userId);

    // Query hints for read-only queries
    @QueryHints(@QueryHint(name = "org.hibernate.readOnly", value = "true"))
    @Query("SELECT p FROM Payment p WHERE p.status = :status")
    List<Payment> findByStatusReadOnly(@Param("status") PaymentStatus status);
}

Batch Operations

@Service
public class PaymentBatchService {

    private static final int BATCH_SIZE = 50;

    @Transactional
    public void createPaymentsBatch(List<Payment> payments) {
        EntityManager em = entityManagerFactory.createEntityManager();

        for (int i = 0; i < payments.size(); i++) {
            em.persist(payments.get(i));

            // Flush and clear every BATCH_SIZE to avoid memory issues
            if (i % BATCH_SIZE == 0 && i > 0) {
                em.flush();
                em.clear();
            }
        }

        em.flush();
        em.clear();
    }
}

# Enable JDBC batching
spring:
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 50
        order_inserts: true
        order_updates: true

Asynchronous Processing

Asynchronous processing maximizes resource utilization by allowing threads to perform other work while waiting for I/O operations (network, disk, database). Traditional blocking I/O wastes CPU cycles - a thread waiting for a 10ms database query is idle for 10ms. With async processing, that thread can handle other requests during the wait.

When to use async: Async processing benefits I/O-bound operations (database queries, HTTP calls, file reads) but not CPU-bound operations (cryptography, image processing, complex calculations). CPU-bound work doesn't wait for I/O, so async provides no benefit - you'd just be adding complexity.

Async complexity: Asynchronous code is harder to write, debug, and reason about. Use it where the performance benefit justifies the complexity. For simple CRUD operations, blocking I/O with proper thread pool sizing often suffices. For more on when to use different concurrency models, see Java Concurrency.

Virtual Threads (Java 21+; prefer Java 25)

Virtual threads are lightweight threads managed by the JVM rather than the operating system. Unlike platform threads (traditional Java threads), virtual threads are cheap to create (millions possible) and don't consume OS resources when blocked.

How virtual threads differ: Platform threads are OS threads - expensive to create (~1MB stack), limited in number (~few thousand), and waste resources when blocked. Virtual threads are JVM-managed - cheap to create (~few KB), virtually unlimited, and release CPU when blocked. This makes "thread-per-request" architectures viable again without reactive programming complexity.

When virtual threads shine: Virtual threads excel for I/O-heavy applications making many concurrent blocking calls. They simplify code compared to reactive frameworks while providing similar scalability. For CPU-bound work, traditional thread pools remain appropriate. Virtual threads are covered in detail in Spring Boot General Guidelines.

# application.yml - Enable virtual threads globally in Spring Boot
spring:
  threads:
    virtual:
      enabled: true

import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;

@Configuration
public class VirtualThreadConfig {

    @Bean
    public ExecutorService virtualThreadExecutor() {
        // Creates virtual thread for each submitted task
        return Executors.newVirtualThreadPerTaskExecutor();
    }
}

@Service
public class PaymentService {

    private final ExecutorService executor;

    public CompletableFuture<PaymentResult> processPaymentAsync(Payment payment) {
        return CompletableFuture.supplyAsync(() -> {
            // I/O-bound operations run efficiently on virtual threads
            // Blocking calls (database, HTTP) don't waste platform threads
            return executePayment(payment);
        }, executor);
    }

    public PaymentResult processPaymentWithParallelChecks(Payment payment) {
        // Structured concurrency: run validations in parallel
        // All subtasks complete or fail together
        try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
            Subtask<ValidationResult> validationTask =
                scope.fork(() -> validator.validate(payment));
            Subtask<FraudCheckResult> fraudTask =
                scope.fork(() -> fraudService.check(payment));

            scope.join();           // Wait for both to complete
            scope.throwIfFailed();  // Propagate first failure

            // Both completed successfully, proceed with payment
            return executePayment(payment);
        }
    }
}

@Async Methods

Spring's @Async annotation provides declarative asynchronous method execution. When called, the method immediately returns a CompletableFuture while the actual work executes on a separate thread. This approach is simpler than manual thread management but requires proper thread pool configuration.

Async pitfalls: @Async only works when called from another Spring bean - self-invocation doesn't create async behavior because it bypasses the proxy. Also, uncaught exceptions in async methods are silently swallowed unless the CompletableFuture is explicitly checked. Consider an AsyncUncaughtExceptionHandler to log these errors.

import org.springframework.scheduling.annotation.Async;
import org.springframework.scheduling.annotation.EnableAsync;

@Configuration
@EnableAsync
public class AsyncConfig implements AsyncConfigurer {

    @Override
    public Executor getAsyncExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("async-");
        executor.initialize();
        return executor;
    }
}

@Service
public class NotificationService {

    @Async
    public CompletableFuture<Void> sendPaymentNotification(Payment payment) {
        log.info("Sending notification for payment {}", payment.getId());
        // Send email/SMS asynchronously
        return CompletableFuture.completedFuture(null);
    }
}

JVM Tuning

JVM tuning optimizes garbage collection (GC), heap sizing, and thread management to balance throughput and latency. The JVM's automatic memory management frees developers from manual memory management but requires configuration for optimal performance under load.

The GC tradeoff: Garbage collection reclaims memory from unreachable objects. During GC, application threads typically pause ("stop-the-world"). Larger heaps reduce GC frequency but increase pause duration. Smaller heaps increase GC frequency but reduce pause duration. Your application's latency requirements drive this tradeoff.

Heap size selection: Set -Xms (initial) equal to -Xmx (maximum) in production to avoid heap resizing overhead. Size the heap based on application memory requirements plus headroom for GC algorithms. Too small causes frequent GC or OutOfMemory errors. Too large increases GC pause times and wastes resources in containerized environments.

Heap Sizing

The example below configures a 2GB heap with G1GC (Garbage-First garbage collector). G1GC is the default in Java 9+ and provides good balance between throughput and latency for most applications.

MaxGCPauseMillis: This is a goal, not a guarantee. G1GC attempts to meet this target by adjusting heap regions and collection frequency. Set this based on your P99 latency requirements - if you need <200ms P99 latency, GC pauses should be well under 200ms.

String deduplication: Applications often create many identical strings (JSON keys, constants, repeated values). String deduplication automatically deduplicates strings in the heap, significantly reducing memory for string-heavy applications with minimal CPU overhead.

# Production JVM flags with G1GC
java -Xms2g \                          # Initial heap size (set equal to max)
     -Xmx2g \                          # Max heap size (prevents resizing overhead)
     -XX:+UseG1GC \                    # G1 garbage collector (default in Java 9+)
     -XX:MaxGCPauseMillis=200 \        # Target maximum GC pause duration
     -XX:+UseStringDeduplication \     # Deduplicate identical strings
     -XX:+ParallelRefProcEnabled \     # Process references in parallel
     -jar payment-service.jar

G1GC Tuning

G1GC divides the heap into regions and collects regions with the most garbage first (hence "Garbage First"). This approach provides more predictable pause times than older collectors.

Heap region size: G1 divides the heap into fixed-size regions (default 1MB-32MB based on heap size). Larger regions reduce metadata overhead but may increase pause times. The example uses 16MB regions - appropriate for multi-GB heaps.

InitiatingHeapOccupancyPercent: Controls when concurrent marking begins. Lower values start GC earlier, preventing full GCs at the cost of more frequent marking cycles. Set to 45-70% depending on allocation rate.

# G1GC optimized for low latency
java -Xms4g -Xmx4g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=100 \           # Aggressive pause time goal
     -XX:G1HeapRegionSize=16m \           # Region size for 4GB heap
     -XX:InitiatingHeapOccupancyPercent=45 \  # Start marking at 45% full
     -XX:G1ReservePercent=10 \            # Reserve 10% for to-space
     -jar payment-service.jar

ZGC for Ultra-Low Latency

ZGC (Z Garbage Collector) is a scalable low-latency garbage collector that performs all expensive work concurrently with application threads. ZGC can maintain <10ms pause times even with multi-terabyte heaps.

When to use ZGC: Use ZGC when you need consistent ultra-low latency (P99 < 10ms) regardless of heap size or load. ZGC trades slightly reduced throughput (5-15%) for dramatically reduced pause times. It requires Java 15+ (production-ready in Java 15, earlier versions are experimental).

Heap sizing for ZGC: ZGC performs best with generous heap sizing - typically 2-3x your active dataset. This provides headroom for concurrent collection. ZGC uses colored pointers and memory mapping, requiring 64-bit platforms.

# ZGC for sub-10ms pause times
java -Xms8g -Xmx8g \                  # Generous heap for concurrent collection
     -XX:+UseZGC \                    # Enable ZGC
     -XX:ZCollectionInterval=5 \      # Collection interval in seconds
     -jar payment-service.jar

Monitoring GC

Monitoring GC behavior reveals whether tuning is needed. Watch for increasing pause times, frequent full GCs, or memory pressure. These indicate misconfiguration or memory leaks requiring investigation.

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;

@Component
public class GCMonitor {

    @Scheduled(fixedRate = 60000)
    public void logGCStats() {
        for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
            log.info("GC: {} - Collections: {}, Time: {}ms",
                gc.getName(),
                gc.getCollectionCount(),
                gc.getCollectionTime());
        }
    }
}

Pagination

Pagination limits result set size to improve response times and reduce memory consumption. Without pagination, queries returning thousands of rows consume excessive memory, bandwidth, and processing time. Pagination also improves UX by loading data incrementally.

Two pagination strategies exist: offset-based and cursor-based. Each has distinct performance characteristics and use cases.

Cursor-Based Pagination (Recommended)

Cursor-based pagination uses a marker (typically a timestamp or ID) to identify where the previous page ended. The next page queries for records after that marker. This approach provides consistent performance regardless of page depth.

Why cursor-based is faster: Offset-based pagination becomes slower on deep pages because databases must scan and skip offset rows. For page 1000 with page size 20, the database scans 20,000 rows to skip the first 19,980. Cursor-based pagination avoids this by using indexed WHERE clauses - the database jumps directly to the cursor position using the index.

Tradeoffs: Cursor-based pagination doesn't support random page access (can't jump to page 50) and requires the sort column to be in the WHERE clause. It's ideal for infinite scroll UIs and chronological feeds. For more on database query patterns, see Data Access Guidelines.

public interface PaymentRepository extends JpaRepository<Payment, String> {

    // Cursor-based pagination using timestamp
    // Index on created_at enables efficient seeking
    @Query("SELECT p FROM Payment p WHERE p.createdAt < :cursor " +
           "ORDER BY p.createdAt DESC")
    List<Payment> findPaymentsBeforeCursor(
        @Param("cursor") Instant cursor,
        Pageable pageable
    );
}

@Service
public class PaymentService {

    public Page<Payment> getPayments(Instant cursor, int pageSize) {
        Pageable pageable = PageRequest.of(0, pageSize);

        // First page: no cursor, sort by timestamp descending
        // Subsequent pages: filter for records before cursor
        List<Payment> payments = cursor == null
            ? paymentRepository.findAll(
                PageRequest.of(0, pageSize, Sort.by("createdAt").descending())
              ).getContent()
            : paymentRepository.findPaymentsBeforeCursor(cursor, pageable);

        return new PageImpl<>(payments, pageable, payments.size());
    }
}

Offset-Based Pagination

Offset-based pagination uses OFFSET and LIMIT to skip rows and return a specific page. This approach supports random page access (jumping to any page number) but performance degrades on deep pages.

Performance characteristics: Page 1 is fast (no rows skipped), but page 1000 requires scanning all previous rows. Databases typically implement OFFSET by reading and discarding rows - expensive for large offsets. Offset-based pagination is acceptable for small datasets or when random page access is required.

Page size limits: Always enforce maximum page size to prevent clients requesting arbitrarily large pages. The example limits to 100 records per page.

@RestController
@RequestMapping("/api/payments")
public class PaymentController {

    @GetMapping
    public Page<PaymentDto> getPayments(
            @RequestParam(defaultValue = "0") int page,
            @RequestParam(defaultValue = "20") int size) {

        // Limit max page size
        size = Math.min(size, 100);

        Pageable pageable = PageRequest.of(page, size, Sort.by("createdAt").descending());
        return paymentService.getPayments(pageable);
    }
}

Summary

Key Takeaways

Measure before optimizing - Use profiling tools (JFR, Actuator) to identify actual bottlenecks, not perceived ones. Premature optimization wastes effort on the wrong problems.
Implement caching - Caching is often the highest-impact optimization. See the Caching Guide for multi-level caching strategies (L1/L2/L3), Caffeine and Redis configuration, and invalidation patterns.
Connection pooling is critical - Reuse database and HTTP connections to eliminate expensive connection establishment overhead (50-200ms per connection).
Database optimization fundamentals - Proper indexing based on query patterns, avoiding N+1 problems with fetch joins, and batching bulk operations. Understand query execution plans.
Async for I/O, not CPU - Use virtual threads or @Async for I/O-bound operations (database, HTTP) but not CPU-bound operations (calculations, crypto). Async adds complexity - use it where performance justifies the cost.
JVM tuning balances throughput and latency - G1GC for general use, ZGC for ultra-low latency requirements. Set heap size appropriately and monitor GC behavior. Larger heaps reduce GC frequency but increase pause times.
Cursor-based pagination for performance - Cursor-based pagination maintains consistent performance on deep pages; offset-based degrades linearly. Choose based on whether you need random page access.
Batch database operations - Flush and clear EntityManager every 50 records during bulk operations to prevent memory exhaustion. Enable JDBC batching in Hibernate configuration.
Continuous monitoring - Track P95/P99 latency, cache hit rates, GC pause times, connection pool utilization. Performance degrades over time - monitoring catches regressions early.

Performance Optimization Workflow

Profile to identify bottlenecks (JFR, Actuator metrics, slow query logs)
Optimize the specific bottleneck (add index, implement caching, optimize query)
Measure impact of optimization (did latency improve? by how much?)
Repeat for next bottleneck

Next Steps: Review Performance Testing for load testing strategies to validate optimizations and Spring Boot Observability for monitoring application performance in production.

Performance Optimization Best Practices

Overview

Core Principles

Profiling and Measurement

Spring Boot Actuator Metrics

Profiling with JFR (Java Flight Recorder)

Custom Performance Logging

Caching Strategies

Connection Pooling

HikariCP Database Connection Pool

HTTP Client Connection Pool

Database Optimization

Index Strategy

Query Optimization

Batch Operations

Asynchronous Processing

Virtual Threads (Java 21+; prefer Java 25)

@Async Methods

JVM Tuning

Heap Sizing

G1GC Tuning

ZGC for Ultra-Low Latency

Monitoring GC

Further Reading

Internal Documentation

External Resources

Summary

Key Takeaways

Performance Optimization Workflow

Overview​

Core Principles​

Profiling and Measurement​

Spring Boot Actuator Metrics​

Profiling with JFR (Java Flight Recorder)​

Custom Performance Logging​

Caching Strategies​

Connection Pooling​

HikariCP Database Connection Pool​

HTTP Client Connection Pool​

Database Optimization​

Index Strategy​

Query Optimization​

Batch Operations​

Asynchronous Processing​

Virtual Threads (Java 21+; prefer Java 25)​

@Async Methods​

JVM Tuning​

Heap Sizing​

G1GC Tuning​

ZGC for Ultra-Low Latency​

Monitoring GC​

Pagination​

Cursor-Based Pagination (Recommended)​

Offset-Based Pagination​

Further Reading​

Internal Documentation​

External Resources​

Summary​

Key Takeaways​

Performance Optimization Workflow​

Overview

Core Principles

Profiling and Measurement

Spring Boot Actuator Metrics

Profiling with JFR (Java Flight Recorder)

Custom Performance Logging

Caching Strategies

Connection Pooling

HikariCP Database Connection Pool

HTTP Client Connection Pool

Database Optimization

Index Strategy

Query Optimization

Batch Operations

Asynchronous Processing

Virtual Threads (Java 21+; prefer Java 25)

@Async Methods

JVM Tuning

Heap Sizing

G1GC Tuning

ZGC for Ultra-Low Latency

Monitoring GC

Pagination

Cursor-Based Pagination (Recommended)

Offset-Based Pagination

Further Reading

Internal Documentation

External Resources

Summary

Key Takeaways

Performance Optimization Workflow