Testing Strategy
A comprehensive testing strategy ensures code quality, prevents regressions, and enables confident deployments. This document covers WHEN and WHY to use each test type; see specialized guides for HOW to implement them.
Overview
Our testing strategy is built on the Testing Honeycomb model, which prioritizes integration tests while maintaining a balance across all test types. This approach recognizes that in modern applications, most bugs occur at integration boundaries - where your code interacts with databases, external services, or message queues - rather than in isolated business logic.
The honeycomb model differs fundamentally from the traditional test pyramid. While the pyramid advocates for a broad base of unit tests (often 70-80%), our experience shows this leads to brittle test suites that mock away real behavior. Instead, the honeycomb places integration tests at the core (50-60% of tests), using them to validate actual component interactions with real dependencies like databases via TestContainers.
The traditional test pyramid was designed for monolithic applications where testing database interactions was slow and expensive. Today, TestContainers provides lightweight, disposable Docker containers that make integration tests nearly as fast as unit tests. This eliminates the primary reason for heavy mocking and allows us to test with real databases, message queues, and caches. The result is tests that catch real bugs - like constraint violations, transaction boundary issues, and serialization errors - that mocks cannot detect.
Applies to: Spring Boot · Angular · React · React Native · Android · iOS
These testing principles apply across all platforms. See framework-specific testing guides for implementation details:
Core Principles
- Integration-First: Prioritize tests that validate real component interactions with actual dependencies
- Mutation Testing: Verify test quality with mutation testing to catch weak test suites
- Fast Feedback: Tests should run quickly enough for frequent execution during development
- Realistic Data: Use representative test data that mirrors production scenarios
- Contract Testing: Validate API contracts between services to prevent integration failures
- Coverage Targets: Maintain meaningful code coverage while ensuring test quality
Testing Honeycomb Model
The honeycomb visualizes our testing strategy as interconnected test types, with integration tests forming the structural core:
Each test type serves a specific purpose in your quality strategy. Understanding WHEN to use each helps you build an effective test suite:
Integration Tests (Core - 50-60%)
When to use: For any code that interacts with external dependencies - databases, caches, message queues, external APIs. This covers most application code since modern applications are primarily coordination between components.
Why prioritize: Integration tests catch the bugs that actually reach production. Database constraint violations, transaction rollback issues, JSON serialization errors, and connection pool exhaustion only surface when testing with real dependencies. Mocks can't detect these problems because they don't replicate the actual behavior.
What they validate:
- Database queries execute correctly with real data
- Transaction boundaries commit/rollback properly
- Constraints (unique, foreign key) are enforced
- API endpoints integrate correctly with service and data layers
- Caching behavior works as expected
Trade-offs: Slightly slower than unit tests (seconds vs milliseconds), require Docker for TestContainers, more setup code. These costs are outweighed by catching real bugs.
See Integration Testing for implementation patterns with TestContainers, MockMvc, and database testing.
Unit Tests (25-35%)
When to use: For complex business logic that needs extensive edge case testing - validation rules, calculation engines, algorithmic code, pure functions. Use unit tests when you need to verify many input combinations quickly without infrastructure overhead.
Why use them: Unit tests excel at exhaustive edge case coverage. Testing a payment calculation with 50 different input combinations runs in milliseconds with unit tests. They're also ideal for TDD workflows where you need immediate feedback while developing algorithms.
What they validate:
- Business rules execute correctly across edge cases
- Calculations produce correct results
- Validation logic catches invalid inputs
- Pure functions transform data correctly
When NOT to use: Don't unit test simple CRUD operations, configuration classes, DTOs, or code that just delegates to other components. Don't mock your entire architecture just to achieve isolation - use integration tests instead.
See Unit Testing for test structure, mocking patterns, and best practices.
Contract Tests (5-10%)
When to use: For any API consumed by another service or team. Essential in microservices architectures where breaking API changes cause cascading failures.
Why use them: Contract tests catch breaking changes before deployment. When the payment service changes its response format, consumer-driven contracts fail immediately, preventing a production incident. They enable independent team deployments by verifying compatibility.
What they validate:
- API responses match expected schemas
- Required fields are present
- Status codes and error formats are correct
- Backward compatibility is maintained
See Contract Testing for Pact consumer-driven contracts and OpenAPI validation.
End-to-End Tests (3-5%)
When to use: Only for critical user journeys that justify the cost - payment processing, account creation, authentication flows. Limit to 3-5 scenarios per application.
Why limit them: E2E tests are slow (10-20 minutes), expensive (full infrastructure), and brittle (break on UI changes). They provide value for verifying the entire stack works together but should not be your primary testing strategy. Robust integration tests and contract tests reduce E2E dependency.
What they validate:
- Complete user flows work end-to-end
- UI components integrate with backend
- Critical business flows complete successfully
See E2E Testing for Cypress, Playwright, and Detox patterns.
Performance Tests (2-3%)
When to use: For establishing baseline metrics and catching performance regressions. Run on schedules (nightly) rather than every commit since they require sustained load.
Why use them: Performance problems discovered in production are expensive to fix. Performance tests catch regressions early - a new query that scans the entire table, a removed cache that increases latency 10x, a memory leak that causes OOM errors.
What they validate:
- Response times meet SLAs under load
- Throughput handles expected traffic
- No performance regressions from code changes
- Memory and resource usage stay within bounds
See Performance Testing for Gatling, k6, and Lighthouse configuration.
Mutation Tests (Quality Gate)
When to use: For all production code, especially critical business logic. Mutation testing validates test quality, not the code itself.
Why use them: Code coverage measures which lines execute, not whether tests verify behavior. You can have 100% coverage with zero assertions. Mutation testing creates deliberate bugs (mutants) and verifies your tests catch them. High mutation coverage (>80%) proves your tests actually detect bugs.
What they validate:
- Tests have meaningful assertions
- Edge cases are covered
- Both positive and negative cases are tested
- Tests catch boundary condition bugs
See Mutation Testing for PITest (Java) and Stryker (JS/TS) configuration.
Test Distribution Target
| Test Type | Percentage | Purpose | Typical Speed |
|---|---|---|---|
| Integration Tests | 50-60% | Core validation with real dependencies | 2-5 minutes |
| Unit Tests | 25-35% | Business logic, algorithms, edge cases | 30-60 seconds |
| Contract Tests | 5-10% | API contract validation | 30-60 seconds |
| E2E Tests | 3-5% | Critical user journeys | 10-20 minutes |
| Performance Tests | 2-3% | Load testing, performance regression | 10-30 minutes |
Why these percentages? The 50-60% allocation to integration tests reflects that most application code involves coordinating between layers - controllers calling services, services using repositories, repositories interacting with databases. Testing these interactions with real dependencies catches the majority of bugs. The 25-35% for unit tests covers complex business logic that benefits from exhaustive edge case testing. E2E tests are limited to 3-5% because they're slow, brittle, and expensive; instead, we achieve confidence through robust integration tests. Contract tests (5-10%) ensure service boundaries remain stable, while performance tests (2-3%) catch regressions in response times and throughput.
This distribution assumes modern tooling like TestContainers that makes integration tests fast enough for frequent execution. Without TestContainers, you might need more unit tests with mocks, but you'd trade off test realism and bug detection capability.
Selecting the Right Test Type
Use this decision guide to choose the appropriate test type for your scenario:
Common Scenarios
"I need to test my REST endpoint that saves to the database" → Integration test with TestContainers. Test the full flow: HTTP request → controller → service → repository → database. Verify the data persists correctly.
"I need to test my payment calculation with many edge cases" → Unit test with parameterized tests. Test dozens of input combinations quickly without database overhead. The calculation is pure logic that doesn't need infrastructure.
"I need to verify our API changes don't break the mobile app" → Contract test with Pact. The mobile team defines expected responses; the backend verifies it satisfies those contracts before deploying.
"I need to test the complete user registration flow" → E2E test for this critical journey, but only this one test. Don't E2E test every registration variation - cover those in integration tests.
"I need to ensure our API response time stays under 200ms" → Performance test with Gatling. Establish a baseline and run nightly to catch regressions before they reach production.
Coverage Targets
Code Coverage
| Component | Target | Tool |
|---|---|---|
| Backend Services | >90% | JaCoCo |
| Frontend Components | >85% | Jest/Istanbul |
| Mobile Apps | >85% | JaCoCo (Android), XCTest (iOS) |
| Utility/Helper Code | >95% | Language-specific tools |
Code coverage measures which lines execute during tests. It identifies untested code paths but doesn't guarantee test quality. A test that calls every line but never asserts anything reports 100% coverage while catching zero bugs.
Code coverage is a necessary but not sufficient metric. Use mutation testing to validate that tests actually catch bugs. Target >80% mutation coverage (Java) or >75% (JS/TS).
Mutation Coverage
| Component | Target | Tool |
|---|---|---|
| Java/Spring Boot | >80% | PITest |
| JavaScript/TypeScript | >75% | Stryker |
| Kotlin | >80% | PITest (Kotlin support) |
Mutation coverage proves your tests detect bugs. When PITest changes > to >= in your code and all tests still pass, you're missing boundary condition tests. High mutation coverage means your tests are actually effective, not just executing code.
See Mutation Testing for configuration and interpretation of results.
Testing Anti-Patterns
These patterns undermine test suite value. Understanding WHY they're problematic helps you avoid them:
Testing Implementation Details
Problem: Tests that verify internal state or private methods break when you refactor, even if behavior stays the same. This discourages refactoring and creates maintenance burden.
Why it happens: Developers try to achieve high coverage by testing every method, including internal ones. This couples tests to implementation rather than behavior.
Solution: Test through the public API or user interface. If you're testing a React component, query by role and text, not internal state (see React Testing). If you're testing a service, call public methods and verify results, not internal calls (see Unit Testing).
Mocking Everything
Problem: Tests with mocks for every dependency verify method call order, not actual behavior. They pass when real code is broken because mocks don't replicate real dependency behavior.
Why it happens: The test pyramid tradition led to mocking everything for "isolation." But isolated tests of mocked code don't catch real integration bugs.
Solution: Use integration tests with real dependencies via TestContainers. Mock only external services you don't control. If you're mocking more than 1-2 dependencies, you probably need an integration test.
Brittle Selectors
Problem: Tests using CSS selectors or DOM structure break when styling changes, even when functionality works. This makes tests a maintenance burden.
Why it happens: Developers use whatever selector works (class names, nth-child) without considering maintenance implications.
Solution: Use semantic queries based on accessibility - getByRole, getByLabelText, getByText. These reflect how users interact with your application and are resilient to implementation changes.
See E2E Testing for selector strategies.
Ignoring Test Maintenance
Problem: Disabled or flaky tests accumulate, providing no value while eroding trust in the test suite. Developers stop paying attention to failures.
Why it happens: Flaky tests are hard to fix, so they get disabled "temporarily." New features get prioritized over test maintenance.
Solution: Treat flaky tests as production bugs. Fix or delete within one sprint. Never accumulate @Disabled tests. Address root causes (race conditions, timing issues) rather than adding retries.
See E2E Testing for handling async and timing issues.
Continuous Integration Testing
Tests execute in CI pipeline stages ordered by speed and failure probability. This fail-fast strategy provides quick feedback and minimizes wasted resources:
Why this order? Unit tests (30-60 seconds) run first to catch basic logic errors quickly. If unit tests pass, mutation tests verify test quality - no point proceeding if tests don't catch bugs. Integration tests follow, validating components work with real dependencies. Contract tests ensure API compatibility before expensive E2E tests. E2E tests run last since they're slowest (10-20 minutes). Performance tests run on schedules (nightly) rather than every commit.
Any failure blocks the merge request, preventing broken code from reaching the main branch.
See CI Testing for complete GitLab pipeline configuration, parallel execution strategies, and quality gates.
Quality Gates
Merge Request Requirements
All merge requests must pass these automated quality gates:
- Code coverage >85% (component-specific targets apply)
- Mutation coverage >80% (Java) / >75% (JS/TS)
- All unit tests passing
- All integration tests passing
- Contract tests passing (if API changes)
- E2E smoke tests passing
- No critical security vulnerabilities (SAST/DAST)
- Code review approved by 2+ reviewers
These gates are enforced in GitLab pipelines and block merge requests. Do not bypass without explicit tech lead approval. See CI Testing for configuration.
Further Reading
Implementation Guides
- Unit Testing - AAA pattern, mocking, parameterized tests
- Integration Testing - TestContainers, MockMvc, database testing
- Mutation Testing - PITest, Stryker, interpreting results
- Contract Testing - Pact, OpenAPI validation
- E2E Testing - Cypress, Playwright, Detox
- Performance Testing - Gatling, k6, Lighthouse
- Test Data Management - Builders, factories, fixtures
- CI Testing - Pipeline configuration, parallel execution
Framework-Specific Guides
Summary
Our testing strategy prioritizes integration tests (50-60%) using TestContainers to validate real component interactions with actual dependencies like databases and message queues. This approach catches the bugs that actually reach production - constraint violations, transaction issues, and serialization errors that mocks cannot detect.
Unit tests (25-35%) cover complex business logic requiring exhaustive edge case testing. Contract tests (5-10%) ensure API compatibility between services. E2E tests (3-5%) verify critical user journeys only. Performance tests (2-3%) catch response time and throughput regressions.
Use mutation testing (PITest for Java, Stryker for JS/TS) to validate that your tests actually catch bugs, not just execute code. Target >80% mutation coverage for Java, >75% for JavaScript/TypeScript. Code coverage measures which lines run; mutation coverage proves tests detect defects.
Avoid anti-patterns: don't mock everything (use real dependencies via TestContainers), don't test implementation details (test behavior through public APIs), don't use brittle selectors (prefer semantic queries), and don't ignore flaky tests (fix or delete them).