Performance Testing Best Practices
Applications must handle expected load and traffic spikes without degradation. Performance testing validates SLAs, prevents performance regressions, and builds confidence before releases. Test early, test often, and automate performance testing in CI/CD pipelines to catch issues before production.
Overview
Performance testing validates that applications meet latency, throughput, and stability requirements under various load conditions. Unlike functional testing which verifies correctness, performance testing verifies responsiveness, scalability, and resource utilization.
This guide covers load testing strategies, tooling (Gatling for JVM, k6 for JavaScript, JMeter), performance budgets, SLA validation, CI/CD integration, and continuous performance monitoring. Effective performance testing reveals bottlenecks before they impact users, provides capacity planning data, and prevents performance regressions through automated testing.
Core Principles
-
Test Early and Continuously: Start performance testing in development, not during final pre-production validation. Early testing catches architectural issues when they're cheap to fix. Waiting until pre-production means fundamental performance problems require expensive rearchitecting.
-
Model Realistic Scenarios: Tests using unrealistic traffic patterns provide unrealistic results. Model actual user behavior including think times, navigation patterns, and data characteristics. A test that hammers a single endpoint constantly doesn't replicate real usage.
-
Define Performance Budgets: Establish specific, measurable performance targets (P95 latency, throughput, error rate) before testing. Without defined budgets, you can't determine pass/fail - just "faster" or "slower" with no context.
-
Automate in CI/CD: Manual performance testing is expensive and infrequent, allowing regressions to accumulate. Automated performance tests in CI/CD catch regressions immediately, typically within hours of the problematic commit. For CI/CD integration strategies, see Pipeline Guidelines.
-
Correlate with Production Metrics: Performance test environments rarely match production exactly. Compare test results with production metrics to validate test environment fidelity and understand how test performance predicts production performance.
-
Iterate on Bottlenecks: Performance testing reveals bottlenecks; optimization eliminates them. The cycle is: test → identify bottleneck → optimize → re-test → validate improvement. Continue until performance budgets are met. See Performance Optimization for optimization techniques.
Types of Performance Tests
Different test types serve different purposes. Choose test types based on what risks you need to evaluate. Most applications benefit from load testing (baseline performance), stress testing (breaking point), and soak testing (stability).
Load Testing
Load testing validates that the system performs acceptably under expected load. This is baseline performance testing - verifying that typical production traffic doesn't cause degradation. Load tests run at steady concurrency for a sustained period (typically 10-60 minutes).
Purpose: Verify the system meets SLAs under normal conditions. Load testing answers: "Can we handle typical traffic?" Results establish baseline metrics for comparison in future testing.
Example: 1,000 concurrent users for 30 minutes simulating typical weekday afternoon traffic.
Stress Testing
Stress testing identifies the system's breaking point by gradually increasing load until performance degrades or the system fails. This reveals capacity limits and helps with capacity planning.
Purpose: Find the maximum load the system can handle before failure. Stress testing answers: "What's our capacity ceiling?" Knowing your breaking point helps with capacity planning and setting auto-scaling thresholds.
Example: Ramp from 100 to 5,000 users over 1 hour, monitoring at what point latency exceeds SLA or errors begin occurring.
Key metric: The concurrency level where P95 latency exceeds your threshold or error rate spikes. This is your safe operating limit.
Spike Testing
Spike testing validates the system's response to sudden traffic increases. Real-world traffic patterns include spikes - product launches, marketing campaigns, viral content. Systems must handle spikes gracefully or at least degrade predictably.
Purpose: Verify the system doesn't crash or severely degrade during traffic surges. Spike testing answers: "What happens during a sudden traffic burst?" Auto-scaling, connection pooling, and caching behavior become critical during spikes.
Example: Jump from 100 to 2,000 users instantly, hold for 10 minutes, then drop back to 100. Observe recovery time and whether the system stabilizes.
Soak Testing
Soak testing (endurance testing) runs at moderate load for extended periods (hours to days) to reveal issues that only manifest over time: memory leaks, connection pool exhaustion, disk space consumption, log file growth.
Purpose: Detect memory leaks and resource leaks that cause gradual degradation. Soak testing answers: "Is the system stable over extended operation?" Memory leaks might not manifest in 30-minute tests but cause outages after days of operation.
Example: 500 concurrent users for 8-24 hours. Monitor memory usage, GC frequency, connection pool metrics, and disk usage. Healthy systems show stable resource utilization; problematic systems show resource consumption trending upward.
Scalability Testing
Scalability testing validates that adding resources (horizontal/vertical scaling) improves performance proportionally. Ideally, doubling instances doubles throughput. In practice, efficiency is 50-80% due to coordination overhead.
Purpose: Verify scaling strategies work and quantify scaling efficiency. Scalability testing answers: "How much does scaling improve performance?" Results guide infrastructure decisions - whether to scale vertically (bigger instances) or horizontally (more instances).
Example: Test with 1, 2, 4, and 8 instances at 1000 concurrent users. Plot throughput against instance count. Linear scaling is ideal; sublinear scaling indicates coordination overhead (shared database, lock contention).
Performance Budgets
Performance budgets are measurable performance targets that define acceptable behavior. Without budgets, performance testing has no pass/fail criteria. Budgets should derive from user experience requirements and business SLAs.
Why percentiles over averages: Averages hide problems. If P50 latency is 50ms but P95 is 2000ms, most users have a great experience while 5% have a terrible experience. Those 5% users generate support tickets and negative reviews. Always set budgets on P95 or P99 latency, not averages.
Setting realistic budgets: Base budgets on user expectations and technical constraints. Web applications generally need <200ms P95 latency for good UX. APIs might tolerate <500ms. Real-time applications require <100ms. Consider your domain and user expectations.
Defining SLAs
Service Level Agreements (SLAs) specify measurable performance commitments. The configuration below defines SLAs for several endpoints, specifying percentile latencies, throughput, and acceptable error rates.
# performance-budgets.yml
api_endpoints:
- name: GET /api/payments
p50_latency_ms: 50
p95_latency_ms: 150
p99_latency_ms: 300
throughput_rps: 1000
error_rate_percent: 0.1
- name: POST /api/payments
p50_latency_ms: 100
p95_latency_ms: 200
p99_latency_ms: 500
throughput_rps: 500
error_rate_percent: 0.1
- name: GET /api/accounts/{id}
p50_latency_ms: 30
p95_latency_ms: 100
p99_latency_ms: 200
throughput_rps: 2000
error_rate_percent: 0.05
Performance Budget Validation
// Assert performance metrics in tests
@Test
void shouldMeetPerformanceBudget() {
// Run load test
GatlingReport report = runLoadTest();
// Assert P95 latency
assertThat(report.getP95Latency())
.isLessThan(Duration.ofMillis(200));
// Assert error rate
assertThat(report.getErrorRate())
.isLessThan(0.001); // 0.1%
// Assert throughput
assertThat(report.getThroughput())
.isGreaterThan(500); // 500 RPS
}
Gatling (Recommended for JVM)
Gatling is a Scala-based load testing tool optimized for testing JVM applications. It provides an expressive DSL for defining scenarios, efficient async HTTP client implementation, and comprehensive HTML reports. Gatling excels at testing RESTful APIs and web applications.
Why Gatling: Gatling uses an async, non-blocking architecture that allows a single load generator to simulate thousands of concurrent users with modest hardware. Traditional threaded tools require one thread per user - limiting concurrency to thousands. Gatling's async approach supports tens of thousands of concurrent users from a single machine.
Reports: Gatling generates detailed HTML reports including response time distribution, throughput over time, and error breakdowns. Reports are standalone HTML files - easy to archive and share with stakeholders.
Language: Gatling tests are written in Scala, which may be unfamiliar if your team uses Java exclusively. However, the DSL is readable even without Scala knowledge. The tradeoff is Gatling's power and ecosystem justify the small learning curve.
Dependencies
// build.gradle
plugins {
id 'io.gatling.gradle' version '3.11.5'
}
gatling {
// Gatling plugin configuration (version managed by plugin)
}
Basic Load Test
The simulation below demonstrates basic Gatling usage: HTTP configuration, a simple scenario, load injection, and assertions. The scenario creates a payment and then retrieves it, simulating a typical API workflow.
Key concepts:
httpProtocol: Defines base URL and default headers applied to all requestsscenario: Defines a sequence of actions a virtual user performscheck: Validates responses and extracts values for subsequent requestssaveAs: Stores extracted values in the virtual user's session (like paymentId here)inject: Defines how virtual users are introduced (ramp-up patterns)assertions: Define pass/fail criteria - test fails if assertions aren't met
Assertions: Gatling tests pass/fail based on assertions. The example asserts P95 latency <200ms and success rate >99.9%. Without assertions, tests always "pass" regardless of performance - assertions provide automated quality gates.
// src/test/scala/simulations/PaymentSimulation.scala
package simulations
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class PaymentSimulation extends Simulation {
val httpProtocol = http
.baseUrl("http://localhost:8080")
.acceptHeader("application/json")
.contentTypeHeader("application/json")
val scn = scenario("Payment Processing")
.exec(
http("Create Payment")
.post("/api/payments")
.body(StringBody("""{
"amount": 100.00,
"currency": "USD",
"userId": "USER-123",
"accountId": "ACC-456"
}"""))
.check(status.is(201))
.check(jsonPath("$.id").saveAs("paymentId")) // Extract ID for next request
)
.pause(1) // 1-second think time between requests
.exec(
http("Get Payment")
.get("/api/payments/${paymentId}") // Use extracted paymentId
.check(status.is(200))
)
setUp(
scn.inject(
// Ramp up to 100 users over 30 seconds (gradual load increase)
rampUsers(100).during(30.seconds),
// Sustain 100 users for 5 minutes (steady-state load)
constantUsersPerSec(100).during(5.minutes)
)
).protocols(httpProtocol)
.assertions(
global.responseTime.percentile(95).lt(200), // P95 latency < 200ms
global.successfulRequests.percent.gt(99.9) // Error rate < 0.1%
)
}
Advanced Scenario with Think Time
The scenario below models realistic user behavior including variable think times, authentication, and multiple correlated requests. Think times prevent unrealistic constant hammering - real users pause between actions.
Modeling realistic behavior: Users don't immediately fire the next request. They read responses, make decisions, type inputs. Think times model this behavior - without them, tests measure "best case" performance that never occurs in production.
Random think times: The pause (2, 5) waits 2-5 seconds randomly distributed. This prevents synchronized load patterns where all virtual users hit the endpoint simultaneously. Real user behavior is desynchronized - tests should match this.
class RealisticPaymentSimulation extends Simulation {
val httpProtocol = http.baseUrl("http://localhost:8080")
val scn = scenario("Realistic Payment Flow")
// User logs in
.exec(
http("Login")
.post("/api/auth/login")
.body(StringBody("""{"username": "user", "password": "pass"}"""))
.check(jsonPath("$.token").saveAs("authToken"))
)
.pause(2, 5) // Think time: 2-5 seconds
// User views account balance
.exec(
http("Get Account")
.get("/api/accounts/${accountId}")
.header("Authorization", "Bearer ${authToken}")
.check(status.is(200))
)
.pause(3, 7)
// User creates payment
.exec(
http("Create Payment")
.post("/api/payments")
.header("Authorization", "Bearer ${authToken}")
.body(StringBody("""{"amount": 50.00, "currency": "USD"}"""))
.check(status.is(201))
.check(jsonPath("$.id").saveAs("paymentId"))
)
.pause(1, 2)
// User checks payment status
.exec(
http("Get Payment Status")
.get("/api/payments/${paymentId}")
.header("Authorization", "Bearer ${authToken}")
.check(jsonPath("$.status").is("COMPLETED"))
)
setUp(
scn.inject(
// Simulate realistic load pattern
nothingFor(5.seconds),
rampUsers(50).during(30.seconds),
constantUsersPerSec(100).during(10.minutes).randomized,
rampUsers(200).during(1.minute), // Spike
constantUsersPerSec(200).during(5.minutes),
rampUsersPerSec(200).to(50).during(2.minutes) // Ramp down
)
).protocols(httpProtocol)
}
Running Gatling Tests
# Gradle
./gradlew gatlingRun --no-daemon
# View report
open build/reports/gatling/paymentsimulation-{timestamp}/index.html
k6 (Recommended for JavaScript/TypeScript)
k6 is a modern, developer-friendly load testing tool with scripts written in JavaScript. It's designed for modern DevOps workflows with focus on automation, CI/CD integration, and code-based test definitions. k6 is particularly strong for teams already using JavaScript/TypeScript.
Why k6: JavaScript is widely known, lowering the barrier to writing performance tests. k6 provides excellent CLI output, easy CI/CD integration, and support for running tests at scale in k6 Cloud. Unlike browser-based tools, k6 is protocol-level - it doesn't render pages, focusing purely on API performance.
Architecture: k6 is written in Go and uses a JavaScript runtime (Goja) for test scripts. This architecture provides the performance of Go with the accessibility of JavaScript. Virtual users run concurrently in the Go runtime, providing excellent efficiency.
Metrics: k6 automatically tracks key metrics (request duration, failure rate, throughput) and supports custom metrics. Output formats include JSON, CSV, InfluxDB, Prometheus, and more - making integration with monitoring systems straightforward.
Installation
# macOS
brew install k6
# Windows
choco install k6
# Linux
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6
Basic Load Test
The test below demonstrates k6's structure: options for load configuration, thresholds for pass/fail criteria, and the default function that each virtual user executes. k6's syntax is clean and accessible to anyone familiar with JavaScript.
Key concepts:
options: Configures load profile (stages), thresholds, and other test parametersstages: Defines ramp-up and ramp-down patterns for loadthresholds: Pass/fail criteria - test fails if thresholds are violatedcheck: Validates responses (doesn't fail test, but tracks success rate)sleep: Introduces think time between requests
Checks vs Thresholds: check validates individual responses and tracks success/failure rate but doesn't fail the test. thresholds define test-level pass/fail criteria. Use checks for request validation and thresholds for aggregate performance requirements.
// payment-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 100 }, // Ramp up to 100 VUs over 30s
{ duration: '5m', target: 100 }, // Maintain 100 VUs for 5 minutes
{ duration: '30s', target: 0 }, // Ramp down to 0 (graceful stop)
],
thresholds: {
http_req_duration: ['p(95)<200'], // 95th percentile < 200ms
http_req_failed: ['rate<0.01'], // Failure rate < 1%
},
};
export default function () {
// Create payment
const payload = JSON.stringify({
amount: 100.00,
currency: 'USD',
userId: 'USER-123',
accountId: 'ACC-456',
});
const params = {
headers: {
'Content-Type': 'application/json',
},
};
const createRes = http.post('http://localhost:8080/api/payments', payload, params);
// Validate response - doesn't fail test, tracks check success rate
check(createRes, {
'status is 201': (r) => r.status === 201,
'response time < 200ms': (r) => r.timings.duration < 200,
'payment id exists': (r) => JSON.parse(r.body).id !== undefined,
});
const paymentId = JSON.parse(createRes.body).id;
sleep(1); // Think time
// Get payment by ID
const getRes = http.get(`http://localhost:8080/api/payments/${paymentId}`);
check(getRes, {
'status is 200': (r) => r.status === 200,
'payment status is COMPLETED': (r) => JSON.parse(r.body).status === 'COMPLETED',
});
sleep(1); // Think time before next iteration
}
Running k6 Tests
# Run test
k6 run payment-load-test.js
# Run with custom duration/VUs
k6 run --duration 10m --vus 200 payment-load-test.js
# Output results to InfluxDB/Grafana
k6 run --out influxdb=http://localhost:8086/k6 payment-load-test.js
k6 Cloud (SaaS)
# Sign up at https://k6.io/cloud
# Run test in cloud
k6 cloud payment-load-test.js
# Run distributed test
k6 cloud --vus 10000 --duration 30m payment-load-test.js
JMeter
Installation
# Download from https://jmeter.apache.org/download_jmeter.cgi
# Extract and run
./bin/jmeter
Test Plan Structure
Test Plan
├── Thread Group (Users)
│ ├── HTTP Request Defaults
│ ├── HTTP Header Manager
│ ├── CSV Data Set Config (test data)
│ ├── HTTP Request: POST /api/payments
│ ├── JSON Extractor (extract payment ID)
│ ├── HTTP Request: GET /api/payments/${paymentId}
│ └── Assertions
├── Listeners
│ ├── View Results Tree
│ ├── Summary Report
│ └── Aggregate Report
Running JMeter from CLI
# Run test plan
jmeter -n -t payment-test.jmx -l results.jtl -e -o report/
# With custom parameters
jmeter -n -t payment-test.jmx \
-Jusers=100 \
-Jrampup=30 \
-Jduration=300 \
-l results.jtl \
-e -o report/
CI/CD Integration
Automated performance testing in CI/CD pipelines catches regressions immediately after code changes. Manual performance testing is too slow - regressions accumulate between infrequent manual test runs. Automated tests provide continuous feedback, enabling rapid iteration and preventing performance debt.
When to run: Run lightweight performance tests on every PR or merge to main. Run comprehensive performance tests nightly or on release branches. The tradeoff is feedback speed vs test thoroughness. Quick "smoke tests" catch obvious regressions; comprehensive tests validate capacity and stability.
Test environment considerations: CI performance tests need stable, production-like environments. Inconsistent environments produce noisy results where regressions are hidden by environmental variance. Consider dedicated performance test environments or containerized infrastructure for reproducibility. For more on CI/CD setup, see Pipeline Guidelines.
GitLab CI Performance Testing
The configuration below demonstrates k6 and Gatling performance tests in GitLab CI. Tests run in isolated containers with the application under test available as a service. Results are stored as artifacts for historical comparison and regression detection.
Service readiness: The sleep 10 gives services time to start before running tests. In production pipelines, use health check probes instead of fixed waits - polling the health endpoint until it returns 200 OK ensures the service is truly ready.
Artifact retention: Performance test results are retained for 30 days. This enables historical comparison to detect gradual performance degradation and provides evidence for capacity planning discussions.
# .gitlab-ci.yml
performance-test:
stage: test
image: grafana/k6:latest
services:
- name: payment-service:latest
alias: payment-service
script:
# Wait for service to be ready (use health check in production)
- sleep 10
# Run performance tests, output JSON for regression comparison
- k6 run --out json=results.json tests/performance/payment-load-test.js
artifacts:
reports:
performance: results.json
paths:
- results.json
when: always # Capture results even if test fails
expire_in: 30 days # Retain for historical comparison
only:
- main
- /^release\/.*/
# Gatling in CI pipeline
performance-test-gatling:
stage: test
image: maven:3.9-eclipse-temurin-21
services:
- name: payment-service:latest
alias: payment-service
script:
- ./gradlew gatlingRun --no-daemon
artifacts:
paths:
- build/reports/gatling/ # HTML reports and raw data
when: always
expire_in: 30 days
Performance Regression Detection
Regression detection compares current test results against a baseline (typically the previous successful run). If performance degrades beyond a threshold (commonly 10-20%), the pipeline fails, preventing the regression from reaching production.
Setting thresholds: Too strict thresholds (e.g., 5%) cause false failures from environmental noise. Too loose thresholds (e.g., 50%) allow significant regressions. Start with 10-15% and adjust based on your environment's variance. Multiple regression detections suggest a real problem, not noise.
Baseline management: The example uses the last successful run as baseline. Alternatively, use a "golden" baseline from a known-good commit. Golden baselines avoid drift where each regression becomes the new baseline.
# .gitlab-ci.yml
performance-regression:
stage: test
script:
# Run current performance test
- k6 run --out json=current.json tests/performance/payment-load-test.js
# Download baseline from previous successful run
- curl -o baseline.json "$CI_API_V4_URL/projects/$CI_PROJECT_ID/jobs/artifacts/main/raw/results.json?job=performance-test"
# Compare results
- python scripts/compare-performance.py baseline.json current.json
allow_failure: false
# scripts/compare-performance.py
import json
import sys
def compare_performance(baseline_file, current_file):
with open(baseline_file) as f:
baseline = json.load(f)
with open(current_file) as f:
current = json.load(f)
baseline_p95 = baseline['metrics']['http_req_duration']['p(95)']
current_p95 = current['metrics']['http_req_duration']['p(95)']
# Fail if P95 increased by more than 10%
regression_threshold = 1.10
if current_p95 > baseline_p95 * regression_threshold:
BAD: print(f" Performance regression detected!")
print(f"Baseline P95: {baseline_p95}ms")
print(f"Current P95: {current_p95}ms")
print(f"Increase: {((current_p95/baseline_p95 - 1) * 100):.2f}%")
sys.exit(1)
else:
GOOD: print(f" Performance acceptable")
print(f"Baseline P95: {baseline_p95}ms")
print(f"Current P95: {current_p95}ms")
sys.exit(0)
if __name__ == '__main__':
compare_performance(sys.argv[1], sys.argv[2])
Database Performance Testing
Simulate Database Load
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.DynamicPropertyRegistry;
import org.springframework.test.context.DynamicPropertySource;
import org.testcontainers.containers.PostgreSQLContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class PaymentPerformanceTest {
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
.withInitScript("schema.sql");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
@Test
void shouldHandleHighVolumePaymentCreation() {
// Create 10,000 payments
StopWatch watch = new StopWatch();
watch.start();
List<Payment> payments = IntStream.range(0, 10_000)
.mapToObj(i -> createPayment())
.toList();
paymentRepository.saveAll(payments);
watch.stop();
// Assert performance
assertThat(watch.getTotalTimeMillis())
.isLessThan(5000); // < 5 seconds for 10k inserts
log.info("Created 10,000 payments in {}ms", watch.getTotalTimeMillis());
}
@Test
void shouldHandleComplexQueryPerformance() {
// Insert test data
insertTestData(100_000);
StopWatch watch = new StopWatch();
watch.start();
// Complex query
List<Payment> payments = paymentRepository.findPaymentsByUserWithFilters(
"USER-123",
PaymentStatus.COMPLETED,
LocalDate.now().minusDays(30),
PageRequest.of(0, 100)
);
watch.stop();
// Assert query performance
assertThat(watch.getTotalTimeMillis())
.isLessThan(100); // < 100ms for complex query on 100k rows
assertThat(payments).hasSize(100);
}
}
Monitoring Performance Tests
Prometheus Metrics During Tests
# docker-compose.yml for monitoring stack
version: '3'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
# prometheus.yml
scrape_configs:
- job_name: 'payment-service'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['payment-service:8080']
Grafana Dashboard for Performance Tests
{
"dashboard": {
"title": "Performance Test Dashboard",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_server_requests_seconds_count[1m])"
}
]
},
{
"title": "P95 Latency",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[1m]))"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "rate(http_server_requests_seconds_count{status=~\"5..\"}[1m])"
}
]
},
{
"title": "Database Connection Pool",
"targets": [
{
"expr": "hikaricp_connections_active / hikaricp_connections_max * 100"
}
]
}
]
}
}
Best Practices
Test Data Management
// Generate realistic test data
@Component
public class PaymentTestDataGenerator {
private final Faker faker = new Faker();
public Payment generateRealisticPayment() {
return Payment.builder()
.id(UUID.randomUUID().toString())
.userId("USER-" + faker.number().numberBetween(1, 10000))
.amount(BigDecimal.valueOf(faker.number().randomDouble(2, 1, 10000)))
.currency(faker.options().option("USD", "EUR", "GBP"))
.status(PaymentStatus.PENDING)
.createdAt(Instant.now())
.build();
}
public List<Payment> generatePayments(int count) {
return IntStream.range(0, count)
.mapToObj(i -> generateRealisticPayment())
.toList();
}
}
Performance Test Checklist
- Test with production-like data volumes
- Use realistic user behavior (think times, navigation patterns)
- Test from multiple geographic locations (latency)
- Include ramp-up and ramp-down periods
- Monitor system resources (CPU, memory, disk, network)
- Test with production-like infrastructure (same VM sizes, network)
- Validate database indexes are used
- Test with connection pooling configured
- Monitor garbage collection during tests
- Validate caching effectiveness
- Test auto-scaling behavior
- Run soak tests to detect memory leaks
- Compare results with performance budgets
- Document test scenarios and results
Further Reading
Internal Documentation
- Performance Optimization - Caching, JVM tuning
- Spring Boot Observability - Monitoring
- Testing Strategy - Overall testing approach
- CI/CD Pipelines - GitLab CI integration
External Resources
Summary
Key Takeaways
-
Establish performance budgets before testing - Define specific P95/P99 latency targets, throughput requirements, and error rate thresholds. Without budgets, you can't determine if tests pass or fail. Base budgets on user experience requirements and business SLAs.
-
Test early and continuously - Start performance testing during development, not in final pre-production validation. Early testing catches architectural issues when they're cheap to fix. Automate tests in CI/CD to catch regressions within hours of problematic commits.
-
Model realistic user behavior - Use think times, variable load patterns, and realistic data. Tests hammering endpoints constantly measure best-case performance that never occurs in production. Think times and realistic scenarios reveal actual performance characteristics.
-
Choose appropriate test types - Load testing validates baseline performance, stress testing finds breaking points, spike testing validates handling of traffic surges, soak testing detects memory leaks, and scalability testing validates scaling strategies. Most applications need load, stress, and soak tests.
-
Understand your tools - Gatling (Scala-based, async, excellent reporting) for JVM projects, k6 (JavaScript-based, modern CLI, cloud-ready) for JavaScript/TypeScript teams, JMeter (GUI-based, mature) for teams needing visual test design. Tool choice affects maintainability and team adoption.
-
Automate regression detection - Compare test results against baselines (previous runs or golden baselines) to catch performance degradation. Fail pipelines when P95 latency increases beyond threshold (typically 10-15%). This prevents regressions from accumulating.
-
Monitor system resources during tests - Track CPU, memory, garbage collection, database connection pools, and cache hit rates during performance tests. Resource metrics reveal bottlenecks that latency metrics don't - high CPU suggests compute bottleneck, high GC suggests memory pressure, connection pool exhaustion suggests database contention.
-
Test environment fidelity matters - Performance test environments should mirror production infrastructure (instance sizes, network configuration, database capacity). Significant differences invalidate test results. Correlate test results with production metrics to validate test environment fidelity.
-
Percentiles over averages - Always measure P95/P99 latency, not averages. Averages hide outliers that frustrate users and generate support tickets. P95 latency represents the experience of your slowest 5% of requests - the group most likely to complain.
-
Iterate on bottlenecks - Performance testing identifies problems; optimization fixes them. The workflow is: test → profile to identify bottleneck → optimize → re-test → validate improvement. Continue until budgets are met. See Performance Optimization for optimization techniques.
Performance Testing Workflow
- Define performance budgets (P95 latency, throughput, error rate)
- Write performance tests modeling realistic scenarios
- Run tests in CI/CD on every main branch commit
- Compare results against baseline to detect regressions
- When regressions occur: profile, optimize, re-test, validate
- Correlate test results with production metrics to validate test fidelity
Tool Selection Guide
- Gatling: Best for JVM projects, teams comfortable with Scala, need for detailed HTML reports
- k6: Best for JavaScript/TypeScript teams, modern DevOps workflows, CLI-first approach
- JMeter: Best for teams needing GUI test builders, mature ecosystem, complex test scenarios
Next Steps: Review Performance Optimization for optimization techniques to eliminate bottlenecks identified through performance testing. Then see Spring Boot Observability for production monitoring to correlate test results with real-world performance.