Performance Testing Best Practices

Proactive Performance Validation

Applications must handle expected load and traffic spikes without degradation. Performance testing validates SLAs, prevents performance regressions, and builds confidence before releases. Test early, test often, and automate performance testing in CI/CD pipelines to catch issues before production.

Overview

Performance testing validates that applications meet latency, throughput, and stability requirements under various load conditions. Unlike functional testing which verifies correctness, performance testing verifies responsiveness, scalability, and resource utilization.

This guide covers load testing strategies, tooling (Gatling for JVM, k6 for JavaScript, JMeter), performance budgets, SLA validation, CI/CD integration, and continuous performance monitoring. Effective performance testing reveals bottlenecks before they impact users, provides capacity planning data, and prevents performance regressions through automated testing.

Core Principles

Test Early and Continuously: Start performance testing in development, not during final pre-production validation. Early testing catches architectural issues when they're cheap to fix. Waiting until pre-production means fundamental performance problems require expensive rearchitecting.
Model Realistic Scenarios: Tests using unrealistic traffic patterns provide unrealistic results. Model actual user behavior including think times, navigation patterns, and data characteristics. A test that hammers a single endpoint constantly doesn't replicate real usage.
Define Performance Budgets: Establish specific, measurable performance targets (P95 latency, throughput, error rate) before testing. Without defined budgets, you can't determine pass/fail - just "faster" or "slower" with no context.
Automate in CI/CD: Manual performance testing is expensive and infrequent, allowing regressions to accumulate. Automated performance tests in CI/CD catch regressions immediately, typically within hours of the problematic commit. For CI/CD integration strategies, see Pipeline Guidelines.
Correlate with Production Metrics: Performance test environments rarely match production exactly. Compare test results with production metrics to validate test environment fidelity and understand how test performance predicts production performance.
Iterate on Bottlenecks: Performance testing reveals bottlenecks; optimization eliminates them. The cycle is: test → identify bottleneck → optimize → re-test → validate improvement. Continue until performance budgets are met. See Performance Optimization for optimization techniques.

Types of Performance Tests

Different test types serve different purposes. Choose test types based on what risks you need to evaluate. Most applications benefit from load testing (baseline performance), stress testing (breaking point), and soak testing (stability).

Load Testing

Load testing validates that the system performs acceptably under expected load. This is baseline performance testing - verifying that typical production traffic doesn't cause degradation. Load tests run at steady concurrency for a sustained period (typically 10-60 minutes).

Purpose: Verify the system meets SLAs under normal conditions. Load testing answers: "Can we handle typical traffic?" Results establish baseline metrics for comparison in future testing.

Example: 1,000 concurrent users for 30 minutes simulating typical weekday afternoon traffic.

Stress Testing

Stress testing identifies the system's breaking point by gradually increasing load until performance degrades or the system fails. This reveals capacity limits and helps with capacity planning.

Purpose: Find the maximum load the system can handle before failure. Stress testing answers: "What's our capacity ceiling?" Knowing your breaking point helps with capacity planning and setting auto-scaling thresholds.

Example: Ramp from 100 to 5,000 users over 1 hour, monitoring at what point latency exceeds SLA or errors begin occurring.

Key metric: The concurrency level where P95 latency exceeds your threshold or error rate spikes. This is your safe operating limit.

Spike Testing

Spike testing validates the system's response to sudden traffic increases. Real-world traffic patterns include spikes - product launches, marketing campaigns, viral content. Systems must handle spikes gracefully or at least degrade predictably.

Purpose: Verify the system doesn't crash or severely degrade during traffic surges. Spike testing answers: "What happens during a sudden traffic burst?" Auto-scaling, connection pooling, and caching behavior become critical during spikes.

Example: Jump from 100 to 2,000 users instantly, hold for 10 minutes, then drop back to 100. Observe recovery time and whether the system stabilizes.

Soak Testing

Soak testing (endurance testing) runs at moderate load for extended periods (hours to days) to reveal issues that only manifest over time: memory leaks, connection pool exhaustion, disk space consumption, log file growth.

Purpose: Detect memory leaks and resource leaks that cause gradual degradation. Soak testing answers: "Is the system stable over extended operation?" Memory leaks might not manifest in 30-minute tests but cause outages after days of operation.

Example: 500 concurrent users for 8-24 hours. Monitor memory usage, GC frequency, connection pool metrics, and disk usage. Healthy systems show stable resource utilization; problematic systems show resource consumption trending upward.

Scalability Testing

Scalability testing validates that adding resources (horizontal/vertical scaling) improves performance proportionally. Ideally, doubling instances doubles throughput. In practice, efficiency is 50-80% due to coordination overhead.

Purpose: Verify scaling strategies work and quantify scaling efficiency. Scalability testing answers: "How much does scaling improve performance?" Results guide infrastructure decisions - whether to scale vertically (bigger instances) or horizontally (more instances).

Example: Test with 1, 2, 4, and 8 instances at 1000 concurrent users. Plot throughput against instance count. Linear scaling is ideal; sublinear scaling indicates coordination overhead (shared database, lock contention).

Performance Budgets

Performance budgets are measurable performance targets that define acceptable behavior. Without budgets, performance testing has no pass/fail criteria. Budgets should derive from user experience requirements and business SLAs.

Why percentiles over averages: Averages hide problems. If P50 latency is 50ms but P95 is 2000ms, most users have a great experience while 5% have a terrible experience. Those 5% users generate support tickets and negative reviews. Always set budgets on P95 or P99 latency, not averages.

Setting realistic budgets: Base budgets on user expectations and technical constraints. Web applications generally need <200ms P95 latency for good UX. APIs might tolerate <500ms. Real-time applications require <100ms. Consider your domain and user expectations.

Defining SLAs

Service Level Agreements (SLAs) specify measurable performance commitments. The configuration below defines SLAs for several endpoints, specifying percentile latencies, throughput, and acceptable error rates.

# performance-budgets.yml
api_endpoints:
  - name: GET /api/payments
    p50_latency_ms: 50
    p95_latency_ms: 150
    p99_latency_ms: 300
    throughput_rps: 1000
    error_rate_percent: 0.1

  - name: POST /api/payments
    p50_latency_ms: 100
    p95_latency_ms: 200
    p99_latency_ms: 500
    throughput_rps: 500
    error_rate_percent: 0.1

  - name: GET /api/accounts/{id}
    p50_latency_ms: 30
    p95_latency_ms: 100
    p99_latency_ms: 200
    throughput_rps: 2000
    error_rate_percent: 0.05

Performance Budget Validation

// Assert performance metrics in tests
@Test
void shouldMeetPerformanceBudget() {
    // Run load test
    GatlingReport report = runLoadTest();

    // Assert P95 latency
    assertThat(report.getP95Latency())
        .isLessThan(Duration.ofMillis(200));

    // Assert error rate
    assertThat(report.getErrorRate())
        .isLessThan(0.001); // 0.1%

    // Assert throughput
    assertThat(report.getThroughput())
        .isGreaterThan(500); // 500 RPS
}

Gatling (Recommended for JVM)

Gatling is a Scala-based load testing tool optimized for testing JVM applications. It provides an expressive DSL for defining scenarios, efficient async HTTP client implementation, and comprehensive HTML reports. Gatling excels at testing RESTful APIs and web applications.

Why Gatling: Gatling uses an async, non-blocking architecture that allows a single load generator to simulate thousands of concurrent users with modest hardware. Traditional threaded tools require one thread per user - limiting concurrency to thousands. Gatling's async approach supports tens of thousands of concurrent users from a single machine.

Reports: Gatling generates detailed HTML reports including response time distribution, throughput over time, and error breakdowns. Reports are standalone HTML files - easy to archive and share with stakeholders.

Language: Gatling tests are written in Scala, which may be unfamiliar if your team uses Java exclusively. However, the DSL is readable even without Scala knowledge. The tradeoff is Gatling's power and ecosystem justify the small learning curve.

Dependencies

// build.gradle
plugins {
    id 'io.gatling.gradle' version '3.11.5'
}

gatling {
    // Gatling plugin configuration (version managed by plugin)
}

Basic Load Test

The simulation below demonstrates basic Gatling usage: HTTP configuration, a simple scenario, load injection, and assertions. The scenario creates a payment and then retrieves it, simulating a typical API workflow.

Key concepts:

httpProtocol: Defines base URL and default headers applied to all requests
scenario: Defines a sequence of actions a virtual user performs
check: Validates responses and extracts values for subsequent requests
saveAs: Stores extracted values in the virtual user's session (like paymentId here)
inject: Defines how virtual users are introduced (ramp-up patterns)
assertions: Define pass/fail criteria - test fails if assertions aren't met

Assertions: Gatling tests pass/fail based on assertions. The example asserts P95 latency <200ms and success rate >99.9%. Without assertions, tests always "pass" regardless of performance - assertions provide automated quality gates.

// src/test/scala/simulations/PaymentSimulation.scala
package simulations

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class PaymentSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("http://localhost:8080")
    .acceptHeader("application/json")
    .contentTypeHeader("application/json")

  val scn = scenario("Payment Processing")
    .exec(
      http("Create Payment")
        .post("/api/payments")
        .body(StringBody("""{
          "amount": 100.00,
          "currency": "USD",
          "userId": "USER-123",
          "accountId": "ACC-456"
        }"""))
        .check(status.is(201))
        .check(jsonPath("$.id").saveAs("paymentId"))  // Extract ID for next request
    )
    .pause(1)  // 1-second think time between requests
    .exec(
      http("Get Payment")
        .get("/api/payments/${paymentId}")  // Use extracted paymentId
        .check(status.is(200))
    )

  setUp(
    scn.inject(
      // Ramp up to 100 users over 30 seconds (gradual load increase)
      rampUsers(100).during(30.seconds),
      // Sustain 100 users for 5 minutes (steady-state load)
      constantUsersPerSec(100).during(5.minutes)
    )
  ).protocols(httpProtocol)
   .assertions(
     global.responseTime.percentile(95).lt(200),  // P95 latency < 200ms
     global.successfulRequests.percent.gt(99.9)   // Error rate < 0.1%
   )
}

Advanced Scenario with Think Time

The scenario below models realistic user behavior including variable think times, authentication, and multiple correlated requests. Think times prevent unrealistic constant hammering - real users pause between actions.

Modeling realistic behavior: Users don't immediately fire the next request. They read responses, make decisions, type inputs. Think times model this behavior - without them, tests measure "best case" performance that never occurs in production.

Random think times: The pause (2, 5) waits 2-5 seconds randomly distributed. This prevents synchronized load patterns where all virtual users hit the endpoint simultaneously. Real user behavior is desynchronized - tests should match this.

class RealisticPaymentSimulation extends Simulation {

  val httpProtocol = http.baseUrl("http://localhost:8080")

  val scn = scenario("Realistic Payment Flow")
    // User logs in
    .exec(
      http("Login")
        .post("/api/auth/login")
        .body(StringBody("""{"username": "user", "password": "pass"}"""))
        .check(jsonPath("$.token").saveAs("authToken"))
    )
    .pause(2, 5) // Think time: 2-5 seconds

    // User views account balance
    .exec(
      http("Get Account")
        .get("/api/accounts/${accountId}")
        .header("Authorization", "Bearer ${authToken}")
        .check(status.is(200))
    )
    .pause(3, 7)

    // User creates payment
    .exec(
      http("Create Payment")
        .post("/api/payments")
        .header("Authorization", "Bearer ${authToken}")
        .body(StringBody("""{"amount": 50.00, "currency": "USD"}"""))
        .check(status.is(201))
        .check(jsonPath("$.id").saveAs("paymentId"))
    )
    .pause(1, 2)

    // User checks payment status
    .exec(
      http("Get Payment Status")
        .get("/api/payments/${paymentId}")
        .header("Authorization", "Bearer ${authToken}")
        .check(jsonPath("$.status").is("COMPLETED"))
    )

  setUp(
    scn.inject(
      // Simulate realistic load pattern
      nothingFor(5.seconds),
      rampUsers(50).during(30.seconds),
      constantUsersPerSec(100).during(10.minutes).randomized,
      rampUsers(200).during(1.minute), // Spike
      constantUsersPerSec(200).during(5.minutes),
      rampUsersPerSec(200).to(50).during(2.minutes) // Ramp down
    )
  ).protocols(httpProtocol)
}

Running Gatling Tests

# Gradle
./gradlew gatlingRun --no-daemon

# View report
open build/reports/gatling/paymentsimulation-{timestamp}/index.html

k6 (Recommended for JavaScript/TypeScript)

k6 is a modern, developer-friendly load testing tool with scripts written in JavaScript. It's designed for modern DevOps workflows with focus on automation, CI/CD integration, and code-based test definitions. k6 is particularly strong for teams already using JavaScript/TypeScript.

Why k6: JavaScript is widely known, lowering the barrier to writing performance tests. k6 provides excellent CLI output, easy CI/CD integration, and support for running tests at scale in k6 Cloud. Unlike browser-based tools, k6 is protocol-level - it doesn't render pages, focusing purely on API performance.

Architecture: k6 is written in Go and uses a JavaScript runtime (Goja) for test scripts. This architecture provides the performance of Go with the accessibility of JavaScript. Virtual users run concurrently in the Go runtime, providing excellent efficiency.

Metrics: k6 automatically tracks key metrics (request duration, failure rate, throughput) and supports custom metrics. Output formats include JSON, CSV, InfluxDB, Prometheus, and more - making integration with monitoring systems straightforward.

Installation

# macOS
brew install k6

# Windows
choco install k6

# Linux
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

Basic Load Test

The test below demonstrates k6's structure: options for load configuration, thresholds for pass/fail criteria, and the default function that each virtual user executes. k6's syntax is clean and accessible to anyone familiar with JavaScript.

Key concepts:

options: Configures load profile (stages), thresholds, and other test parameters
stages: Defines ramp-up and ramp-down patterns for load
thresholds: Pass/fail criteria - test fails if thresholds are violated
check: Validates responses (doesn't fail test, but tracks success rate)
sleep: Introduces think time between requests

Checks vs Thresholds: check validates individual responses and tracks success/failure rate but doesn't fail the test. thresholds define test-level pass/fail criteria. Use checks for request validation and thresholds for aggregate performance requirements.

// payment-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 100 },  // Ramp up to 100 VUs over 30s
    { duration: '5m', target: 100 },   // Maintain 100 VUs for 5 minutes
    { duration: '30s', target: 0 },    // Ramp down to 0 (graceful stop)
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'],  // 95th percentile < 200ms
    http_req_failed: ['rate<0.01'],    // Failure rate < 1%
  },
};

export default function () {
  // Create payment
  const payload = JSON.stringify({
    amount: 100.00,
    currency: 'USD',
    userId: 'USER-123',
    accountId: 'ACC-456',
  });

  const params = {
    headers: {
      'Content-Type': 'application/json',
    },
  };

  const createRes = http.post('http://localhost:8080/api/payments', payload, params);

  // Validate response - doesn't fail test, tracks check success rate
  check(createRes, {
    'status is 201': (r) => r.status === 201,
    'response time < 200ms': (r) => r.timings.duration < 200,
    'payment id exists': (r) => JSON.parse(r.body).id !== undefined,
  });

  const paymentId = JSON.parse(createRes.body).id;

  sleep(1);  // Think time

  // Get payment by ID
  const getRes = http.get(`http://localhost:8080/api/payments/${paymentId}`);

  check(getRes, {
    'status is 200': (r) => r.status === 200,
    'payment status is COMPLETED': (r) => JSON.parse(r.body).status === 'COMPLETED',
  });

  sleep(1);  // Think time before next iteration
}

Running k6 Tests

# Run test
k6 run payment-load-test.js

# Run with custom duration/VUs
k6 run --duration 10m --vus 200 payment-load-test.js

# Output results to InfluxDB/Grafana
k6 run --out influxdb=http://localhost:8086/k6 payment-load-test.js

k6 Cloud (SaaS)

# Sign up at https://k6.io/cloud

# Run test in cloud
k6 cloud payment-load-test.js

# Run distributed test
k6 cloud --vus 10000 --duration 30m payment-load-test.js

JMeter

Installation

# Download from https://jmeter.apache.org/download_jmeter.cgi
# Extract and run
./bin/jmeter

Test Plan Structure

Test Plan
├── Thread Group (Users)
│   ├── HTTP Request Defaults
│   ├── HTTP Header Manager
│   ├── CSV Data Set Config (test data)
│   ├── HTTP Request: POST /api/payments
│   ├── JSON Extractor (extract payment ID)
│   ├── HTTP Request: GET /api/payments/${paymentId}
│   └── Assertions
├── Listeners
│   ├── View Results Tree
│   ├── Summary Report
│   └── Aggregate Report

Running JMeter from CLI

# Run test plan
jmeter -n -t payment-test.jmx -l results.jtl -e -o report/

# With custom parameters
jmeter -n -t payment-test.jmx \
  -Jusers=100 \
  -Jrampup=30 \
  -Jduration=300 \
  -l results.jtl \
  -e -o report/

CI/CD Integration

Automated performance testing in CI/CD pipelines catches regressions immediately after code changes. Manual performance testing is too slow - regressions accumulate between infrequent manual test runs. Automated tests provide continuous feedback, enabling rapid iteration and preventing performance debt.

When to run: Run lightweight performance tests on every PR or merge to main. Run comprehensive performance tests nightly or on release branches. The tradeoff is feedback speed vs test thoroughness. Quick "smoke tests" catch obvious regressions; comprehensive tests validate capacity and stability.

Test environment considerations: CI performance tests need stable, production-like environments. Inconsistent environments produce noisy results where regressions are hidden by environmental variance. Consider dedicated performance test environments or containerized infrastructure for reproducibility. For more on CI/CD setup, see Pipeline Guidelines.

GitLab CI Performance Testing

The configuration below demonstrates k6 and Gatling performance tests in GitLab CI. Tests run in isolated containers with the application under test available as a service. Results are stored as artifacts for historical comparison and regression detection.

Service readiness: The sleep 10 gives services time to start before running tests. In production pipelines, use health check probes instead of fixed waits - polling the health endpoint until it returns 200 OK ensures the service is truly ready.

Artifact retention: Performance test results are retained for 30 days. This enables historical comparison to detect gradual performance degradation and provides evidence for capacity planning discussions.

# .gitlab-ci.yml
performance-test:
  stage: test
  image: grafana/k6:latest
  services:
    - name: payment-service:latest
      alias: payment-service
  script:
    # Wait for service to be ready (use health check in production)
    - sleep 10
    # Run performance tests, output JSON for regression comparison
    - k6 run --out json=results.json tests/performance/payment-load-test.js
  artifacts:
    reports:
      performance: results.json
    paths:
      - results.json
    when: always        # Capture results even if test fails
    expire_in: 30 days  # Retain for historical comparison
  only:
    - main
    - /^release\/.*/

# Gatling in CI pipeline
performance-test-gatling:
  stage: test
  image: maven:3.9-eclipse-temurin-21
  services:
    - name: payment-service:latest
      alias: payment-service
  script:
    - ./gradlew gatlingRun --no-daemon
  artifacts:
    paths:
      - build/reports/gatling/  # HTML reports and raw data
    when: always
    expire_in: 30 days

Performance Regression Detection

Regression detection compares current test results against a baseline (typically the previous successful run). If performance degrades beyond a threshold (commonly 10-20%), the pipeline fails, preventing the regression from reaching production.

Setting thresholds: Too strict thresholds (e.g., 5%) cause false failures from environmental noise. Too loose thresholds (e.g., 50%) allow significant regressions. Start with 10-15% and adjust based on your environment's variance. Multiple regression detections suggest a real problem, not noise.

Baseline management: The example uses the last successful run as baseline. Alternatively, use a "golden" baseline from a known-good commit. Golden baselines avoid drift where each regression becomes the new baseline.

# .gitlab-ci.yml
performance-regression:
  stage: test
  script:
    # Run current performance test
    - k6 run --out json=current.json tests/performance/payment-load-test.js

    # Download baseline from previous successful run
    - curl -o baseline.json "$CI_API_V4_URL/projects/$CI_PROJECT_ID/jobs/artifacts/main/raw/results.json?job=performance-test"

    # Compare results
    - python scripts/compare-performance.py baseline.json current.json
  allow_failure: false

# scripts/compare-performance.py
import json
import sys

def compare_performance(baseline_file, current_file):
    with open(baseline_file) as f:
        baseline = json.load(f)

    with open(current_file) as f:
        current = json.load(f)

    baseline_p95 = baseline['metrics']['http_req_duration']['p(95)']
    current_p95 = current['metrics']['http_req_duration']['p(95)']

    # Fail if P95 increased by more than 10%
    regression_threshold = 1.10
    if current_p95 > baseline_p95 * regression_threshold:
BAD:         print(f" Performance regression detected!")
        print(f"Baseline P95: {baseline_p95}ms")
        print(f"Current P95: {current_p95}ms")
        print(f"Increase: {((current_p95/baseline_p95 - 1) * 100):.2f}%")
        sys.exit(1)
    else:
GOOD:         print(f" Performance acceptable")
        print(f"Baseline P95: {baseline_p95}ms")
        print(f"Current P95: {current_p95}ms")
        sys.exit(0)

if __name__ == '__main__':
    compare_performance(sys.argv[1], sys.argv[2])

Database Performance Testing

Simulate Database Load

import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.DynamicPropertyRegistry;
import org.springframework.test.context.DynamicPropertySource;
import org.testcontainers.containers.PostgreSQLContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class PaymentPerformanceTest {

    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
        .withInitScript("schema.sql");

    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", postgres::getJdbcUrl);
        registry.add("spring.datasource.username", postgres::getUsername);
        registry.add("spring.datasource.password", postgres::getPassword);
    }

    @Test
    void shouldHandleHighVolumePaymentCreation() {
        // Create 10,000 payments
        StopWatch watch = new StopWatch();
        watch.start();

        List<Payment> payments = IntStream.range(0, 10_000)
            .mapToObj(i -> createPayment())
            .toList();

        paymentRepository.saveAll(payments);

        watch.stop();

        // Assert performance
        assertThat(watch.getTotalTimeMillis())
            .isLessThan(5000); // < 5 seconds for 10k inserts

        log.info("Created 10,000 payments in {}ms", watch.getTotalTimeMillis());
    }

    @Test
    void shouldHandleComplexQueryPerformance() {
        // Insert test data
        insertTestData(100_000);

        StopWatch watch = new StopWatch();
        watch.start();

        // Complex query
        List<Payment> payments = paymentRepository.findPaymentsByUserWithFilters(
            "USER-123",
            PaymentStatus.COMPLETED,
            LocalDate.now().minusDays(30),
            PageRequest.of(0, 100)
        );

        watch.stop();

        // Assert query performance
        assertThat(watch.getTotalTimeMillis())
            .isLessThan(100); // < 100ms for complex query on 100k rows

        assertThat(payments).hasSize(100);
    }
}

Monitoring Performance Tests

Prometheus Metrics During Tests

# docker-compose.yml for monitoring stack
version: '3'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

# prometheus.yml
scrape_configs:
  - job_name: 'payment-service'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['payment-service:8080']

Grafana Dashboard for Performance Tests

{
  "dashboard": {
    "title": "Performance Test Dashboard",
    "panels": [
      {
        "title": "Request Rate",
        "targets": [
          {
            "expr": "rate(http_server_requests_seconds_count[1m])"
          }
        ]
      },
      {
        "title": "P95 Latency",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[1m]))"
          }
        ]
      },
      {
        "title": "Error Rate",
        "targets": [
          {
            "expr": "rate(http_server_requests_seconds_count{status=~\"5..\"}[1m])"
          }
        ]
      },
      {
        "title": "Database Connection Pool",
        "targets": [
          {
            "expr": "hikaricp_connections_active / hikaricp_connections_max * 100"
          }
        ]
      }
    ]
  }
}

Best Practices

Test Data Management

// Generate realistic test data
@Component
public class PaymentTestDataGenerator {

    private final Faker faker = new Faker();

    public Payment generateRealisticPayment() {
        return Payment.builder()
            .id(UUID.randomUUID().toString())
            .userId("USER-" + faker.number().numberBetween(1, 10000))
            .amount(BigDecimal.valueOf(faker.number().randomDouble(2, 1, 10000)))
            .currency(faker.options().option("USD", "EUR", "GBP"))
            .status(PaymentStatus.PENDING)
            .createdAt(Instant.now())
            .build();
    }

    public List<Payment> generatePayments(int count) {
        return IntStream.range(0, count)
            .mapToObj(i -> generateRealisticPayment())
            .toList();
    }
}

Performance Test Checklist

Summary

Key Takeaways

Establish performance budgets before testing - Define specific P95/P99 latency targets, throughput requirements, and error rate thresholds. Without budgets, you can't determine if tests pass or fail. Base budgets on user experience requirements and business SLAs.
Test early and continuously - Start performance testing during development, not in final pre-production validation. Early testing catches architectural issues when they're cheap to fix. Automate tests in CI/CD to catch regressions within hours of problematic commits.
Model realistic user behavior - Use think times, variable load patterns, and realistic data. Tests hammering endpoints constantly measure best-case performance that never occurs in production. Think times and realistic scenarios reveal actual performance characteristics.
Choose appropriate test types - Load testing validates baseline performance, stress testing finds breaking points, spike testing validates handling of traffic surges, soak testing detects memory leaks, and scalability testing validates scaling strategies. Most applications need load, stress, and soak tests.
Understand your tools - Gatling (Scala-based, async, excellent reporting) for JVM projects, k6 (JavaScript-based, modern CLI, cloud-ready) for JavaScript/TypeScript teams, JMeter (GUI-based, mature) for teams needing visual test design. Tool choice affects maintainability and team adoption.
Automate regression detection - Compare test results against baselines (previous runs or golden baselines) to catch performance degradation. Fail pipelines when P95 latency increases beyond threshold (typically 10-15%). This prevents regressions from accumulating.
Monitor system resources during tests - Track CPU, memory, garbage collection, database connection pools, and cache hit rates during performance tests. Resource metrics reveal bottlenecks that latency metrics don't - high CPU suggests compute bottleneck, high GC suggests memory pressure, connection pool exhaustion suggests database contention.
Test environment fidelity matters - Performance test environments should mirror production infrastructure (instance sizes, network configuration, database capacity). Significant differences invalidate test results. Correlate test results with production metrics to validate test environment fidelity.
Percentiles over averages - Always measure P95/P99 latency, not averages. Averages hide outliers that frustrate users and generate support tickets. P95 latency represents the experience of your slowest 5% of requests - the group most likely to complain.
Iterate on bottlenecks - Performance testing identifies problems; optimization fixes them. The workflow is: test → profile to identify bottleneck → optimize → re-test → validate improvement. Continue until budgets are met. See Performance Optimization for optimization techniques.

Performance Testing Workflow

Define performance budgets (P95 latency, throughput, error rate)
Write performance tests modeling realistic scenarios
Run tests in CI/CD on every main branch commit
Compare results against baseline to detect regressions
When regressions occur: profile, optimize, re-test, validate
Correlate test results with production metrics to validate test fidelity

Tool Selection Guide

Gatling: Best for JVM projects, teams comfortable with Scala, need for detailed HTML reports
k6: Best for JavaScript/TypeScript teams, modern DevOps workflows, CLI-first approach
JMeter: Best for teams needing GUI test builders, mature ecosystem, complex test scenarios

Next Steps: Review Performance Optimization for optimization techniques to eliminate bottlenecks identified through performance testing. Then see Spring Boot Observability for production monitoring to correlate test results with real-world performance.

Overview​

Core Principles​

Types of Performance Tests​

Load Testing​

Stress Testing​

Spike Testing​

Soak Testing​

Scalability Testing​

Performance Budgets​

Defining SLAs​

Performance Budget Validation​

Gatling (Recommended for JVM)​

Dependencies​

Basic Load Test​

Advanced Scenario with Think Time​

Running Gatling Tests​

k6 (Recommended for JavaScript/TypeScript)​

Installation​

Basic Load Test​

Running k6 Tests​

k6 Cloud (SaaS)​

JMeter​

Installation​

Test Plan Structure​

Running JMeter from CLI​

CI/CD Integration​

GitLab CI Performance Testing​

Performance Regression Detection​

Database Performance Testing​

Simulate Database Load​

Monitoring Performance Tests​

Prometheus Metrics During Tests​

Grafana Dashboard for Performance Tests​

Best Practices​

Test Data Management​

Performance Test Checklist​

Further Reading​

Internal Documentation​

External Resources​

Summary​

Key Takeaways​

Performance Testing Workflow​

Tool Selection Guide​

Overview

Core Principles

Types of Performance Tests

Load Testing

Stress Testing

Spike Testing

Soak Testing

Scalability Testing

Performance Budgets

Defining SLAs

Performance Budget Validation

Gatling (Recommended for JVM)

Dependencies

Basic Load Test

Advanced Scenario with Think Time

Running Gatling Tests

k6 (Recommended for JavaScript/TypeScript)

Installation

Basic Load Test

Running k6 Tests

k6 Cloud (SaaS)

JMeter

Installation

Test Plan Structure

Running JMeter from CLI

CI/CD Integration

GitLab CI Performance Testing

Performance Regression Detection

Database Performance Testing

Simulate Database Load

Monitoring Performance Tests

Prometheus Metrics During Tests

Grafana Dashboard for Performance Tests

Best Practices

Test Data Management

Performance Test Checklist

Further Reading

Internal Documentation

External Resources

Summary

Key Takeaways

Performance Testing Workflow

Tool Selection Guide