Testing Strategy

A comprehensive testing strategy ensures code quality, prevents regressions, and enables confident deployments. This document covers WHEN and WHY to use each test type; see specialized guides for HOW to implement them.

Overview

Our testing strategy is built on the Testing Honeycomb model, which prioritizes integration tests while maintaining a balance across all test types. This approach recognizes that in modern applications, most bugs occur at integration boundaries - where your code interacts with databases, external services, or message queues - rather than in isolated business logic.

The honeycomb model differs fundamentally from the traditional test pyramid. While the pyramid advocates for a broad base of unit tests (often 70-80%), our experience shows this leads to brittle test suites that mock away real behavior. Instead, the honeycomb places integration tests at the core (50-60% of tests), using them to validate actual component interactions with real dependencies like databases via TestContainers.

Why the Testing Honeycomb?

The traditional test pyramid was designed for monolithic applications where testing database interactions was slow and expensive. Today, TestContainers provides lightweight, disposable Docker containers that make integration tests nearly as fast as unit tests. This eliminates the primary reason for heavy mocking and allows us to test with real databases, message queues, and caches. The result is tests that catch real bugs - like constraint violations, transaction boundary issues, and serialization errors - that mocks cannot detect.

Platform Applicability

Applies to: Spring Boot · Angular · React · React Native · Android · iOS

These testing principles apply across all platforms. See framework-specific testing guides for implementation details:

Core Principles

Integration-First: Prioritize tests that validate real component interactions with actual dependencies
Mutation Testing: Verify test quality with mutation testing to catch weak test suites
Fast Feedback: Tests should run quickly enough for frequent execution during development
Realistic Data: Use representative test data that mirrors production scenarios
Contract Testing: Validate API contracts between services to prevent integration failures
Coverage Targets: Maintain meaningful code coverage while ensuring test quality

Testing Honeycomb Model

The honeycomb visualizes our testing strategy as interconnected test types, with integration tests forming the structural core:

Each test type serves a specific purpose in your quality strategy. Understanding WHEN to use each helps you build an effective test suite:

Integration Tests (Core - 50-60%)

When to use: For any code that interacts with external dependencies - databases, caches, message queues, external APIs. This covers most application code since modern applications are primarily coordination between components.

Why prioritize: Integration tests catch the bugs that actually reach production. Database constraint violations, transaction rollback issues, JSON serialization errors, and connection pool exhaustion only surface when testing with real dependencies. Mocks can't detect these problems because they don't replicate the actual behavior.

What they validate:

Database queries execute correctly with real data
Transaction boundaries commit/rollback properly
Constraints (unique, foreign key) are enforced
API endpoints integrate correctly with service and data layers
Caching behavior works as expected

Trade-offs: Slightly slower than unit tests (seconds vs milliseconds), require Docker for TestContainers, more setup code. These costs are outweighed by catching real bugs.

See Integration Testing for implementation patterns with TestContainers, MockMvc, and database testing.

Unit Tests (25-35%)

When to use: For complex business logic that needs extensive edge case testing - validation rules, calculation engines, algorithmic code, pure functions. Use unit tests when you need to verify many input combinations quickly without infrastructure overhead.

Why use them: Unit tests excel at exhaustive edge case coverage. Testing a payment calculation with 50 different input combinations runs in milliseconds with unit tests. They're also ideal for TDD workflows where you need immediate feedback while developing algorithms.

What they validate:

Business rules execute correctly across edge cases
Calculations produce correct results
Validation logic catches invalid inputs
Pure functions transform data correctly

When NOT to use: Don't unit test simple CRUD operations, configuration classes, DTOs, or code that just delegates to other components. Don't mock your entire architecture just to achieve isolation - use integration tests instead.

See Unit Testing for test structure, mocking patterns, and best practices.

Contract Tests (5-10%)

When to use: For any API consumed by another service or team. Essential in microservices architectures where breaking API changes cause cascading failures.

Why use them: Contract tests catch breaking changes before deployment. When the payment service changes its response format, consumer-driven contracts fail immediately, preventing a production incident. They enable independent team deployments by verifying compatibility.

What they validate:

API responses match expected schemas
Required fields are present
Status codes and error formats are correct
Backward compatibility is maintained

See Contract Testing for Pact consumer-driven contracts and OpenAPI validation.

End-to-End Tests (3-5%)

When to use: Only for critical user journeys that justify the cost - payment processing, account creation, authentication flows. Limit to 3-5 scenarios per application.

Why limit them: E2E tests are slow (10-20 minutes), expensive (full infrastructure), and brittle (break on UI changes). They provide value for verifying the entire stack works together but should not be your primary testing strategy. Robust integration tests and contract tests reduce E2E dependency.

What they validate:

Complete user flows work end-to-end
UI components integrate with backend
Critical business flows complete successfully

See E2E Testing for Cypress, Playwright, and Detox patterns.

Performance Tests (2-3%)

When to use: For establishing baseline metrics and catching performance regressions. Run on schedules (nightly) rather than every commit since they require sustained load.

Why use them: Performance problems discovered in production are expensive to fix. Performance tests catch regressions early - a new query that scans the entire table, a removed cache that increases latency 10x, a memory leak that causes OOM errors.

What they validate:

Response times meet SLAs under load
Throughput handles expected traffic
No performance regressions from code changes
Memory and resource usage stay within bounds

See Performance Testing for Gatling, k6, and Lighthouse configuration.

Mutation Tests (Quality Gate)

When to use: For all production code, especially critical business logic. Mutation testing validates test quality, not the code itself.

Why use them: Code coverage measures which lines execute, not whether tests verify behavior. You can have 100% coverage with zero assertions. Mutation testing creates deliberate bugs (mutants) and verifies your tests catch them. High mutation coverage (>80%) proves your tests actually detect bugs.

What they validate:

Tests have meaningful assertions
Edge cases are covered
Both positive and negative cases are tested
Tests catch boundary condition bugs

See Mutation Testing for PITest (Java) and Stryker (JS/TS) configuration.

Test Distribution Target

Test Type	Percentage	Purpose	Typical Speed
Integration Tests	50-60%	Core validation with real dependencies	2-5 minutes
Unit Tests	25-35%	Business logic, algorithms, edge cases	30-60 seconds
Contract Tests	5-10%	API contract validation	30-60 seconds
E2E Tests	3-5%	Critical user journeys	10-20 minutes
Performance Tests	2-3%	Load testing, performance regression	10-30 minutes

Why these percentages? The 50-60% allocation to integration tests reflects that most application code involves coordinating between layers - controllers calling services, services using repositories, repositories interacting with databases. Testing these interactions with real dependencies catches the majority of bugs. The 25-35% for unit tests covers complex business logic that benefits from exhaustive edge case testing. E2E tests are limited to 3-5% because they're slow, brittle, and expensive; instead, we achieve confidence through robust integration tests. Contract tests (5-10%) ensure service boundaries remain stable, while performance tests (2-3%) catch regressions in response times and throughput.

This distribution assumes modern tooling like TestContainers that makes integration tests fast enough for frequent execution. Without TestContainers, you might need more unit tests with mocks, but you'd trade off test realism and bug detection capability.

Selecting the Right Test Type

Use this decision guide to choose the appropriate test type for your scenario:

Common Scenarios

"I need to test my REST endpoint that saves to the database" → Integration test with TestContainers. Test the full flow: HTTP request → controller → service → repository → database. Verify the data persists correctly.

"I need to test my payment calculation with many edge cases" → Unit test with parameterized tests. Test dozens of input combinations quickly without database overhead. The calculation is pure logic that doesn't need infrastructure.

"I need to verify our API changes don't break the mobile app" → Contract test with Pact. The mobile team defines expected responses; the backend verifies it satisfies those contracts before deploying.

"I need to test the complete user registration flow" → E2E test for this critical journey, but only this one test. Don't E2E test every registration variation - cover those in integration tests.

"I need to ensure our API response time stays under 200ms" → Performance test with Gatling. Establish a baseline and run nightly to catch regressions before they reach production.

Coverage Targets

Code Coverage

Component	Target	Tool
Backend Services	>90%	JaCoCo
Frontend Components	>85%	Jest/Istanbul
Mobile Apps	>85%	JaCoCo (Android), XCTest (iOS)
Utility/Helper Code	>95%	Language-specific tools

Code coverage measures which lines execute during tests. It identifies untested code paths but doesn't guarantee test quality. A test that calls every line but never asserts anything reports 100% coverage while catching zero bugs.

Coverage is Not Enough

Code coverage is a necessary but not sufficient metric. Use mutation testing to validate that tests actually catch bugs. Target >80% mutation coverage (Java) or >75% (JS/TS).

Mutation Coverage

Component	Target	Tool
Java/Spring Boot	>80%	PITest
JavaScript/TypeScript	>75%	Stryker
Kotlin	>80%	PITest (Kotlin support)

Mutation coverage proves your tests detect bugs. When PITest changes > to >= in your code and all tests still pass, you're missing boundary condition tests. High mutation coverage means your tests are actually effective, not just executing code.

See Mutation Testing for configuration and interpretation of results.

Testing Anti-Patterns

These patterns undermine test suite value. Understanding WHY they're problematic helps you avoid them:

Testing Implementation Details

Problem: Tests that verify internal state or private methods break when you refactor, even if behavior stays the same. This discourages refactoring and creates maintenance burden.

Why it happens: Developers try to achieve high coverage by testing every method, including internal ones. This couples tests to implementation rather than behavior.

Solution: Test through the public API or user interface. If you're testing a React component, query by role and text, not internal state (see React Testing). If you're testing a service, call public methods and verify results, not internal calls (see Unit Testing).

Mocking Everything

Problem: Tests with mocks for every dependency verify method call order, not actual behavior. They pass when real code is broken because mocks don't replicate real dependency behavior.

Why it happens: The test pyramid tradition led to mocking everything for "isolation." But isolated tests of mocked code don't catch real integration bugs.

Solution: Use integration tests with real dependencies via TestContainers. Mock only external services you don't control. If you're mocking more than 1-2 dependencies, you probably need an integration test.

Brittle Selectors

Problem: Tests using CSS selectors or DOM structure break when styling changes, even when functionality works. This makes tests a maintenance burden.

Why it happens: Developers use whatever selector works (class names, nth-child) without considering maintenance implications.

Solution: Use semantic queries based on accessibility - getByRole, getByLabelText, getByText. These reflect how users interact with your application and are resilient to implementation changes.

See E2E Testing for selector strategies.

Ignoring Test Maintenance

Problem: Disabled or flaky tests accumulate, providing no value while eroding trust in the test suite. Developers stop paying attention to failures.

Why it happens: Flaky tests are hard to fix, so they get disabled "temporarily." New features get prioritized over test maintenance.

Solution: Treat flaky tests as production bugs. Fix or delete within one sprint. Never accumulate @Disabled tests. Address root causes (race conditions, timing issues) rather than adding retries.

See E2E Testing for handling async and timing issues.

Continuous Integration Testing

Tests execute in CI pipeline stages ordered by speed and failure probability. This fail-fast strategy provides quick feedback and minimizes wasted resources:

Why this order? Unit tests (30-60 seconds) run first to catch basic logic errors quickly. If unit tests pass, mutation tests verify test quality - no point proceeding if tests don't catch bugs. Integration tests follow, validating components work with real dependencies. Contract tests ensure API compatibility before expensive E2E tests. E2E tests run last since they're slowest (10-20 minutes). Performance tests run on schedules (nightly) rather than every commit.

Any failure blocks the merge request, preventing broken code from reaching the main branch.

See CI Testing for complete GitLab pipeline configuration, parallel execution strategies, and quality gates.

Quality Gates

Merge Request Requirements

All merge requests must pass these automated quality gates:

Code coverage >85% (component-specific targets apply)
Mutation coverage >80% (Java) / >75% (JS/TS)
All unit tests passing
All integration tests passing
Contract tests passing (if API changes)
E2E smoke tests passing
No critical security vulnerabilities (SAST/DAST)
Code review approved by 2+ reviewers

Blocking Quality Gates

These gates are enforced in GitLab pipelines and block merge requests. Do not bypass without explicit tech lead approval. See CI Testing for configuration.

Summary

Our testing strategy prioritizes integration tests (50-60%) using TestContainers to validate real component interactions with actual dependencies like databases and message queues. This approach catches the bugs that actually reach production - constraint violations, transaction issues, and serialization errors that mocks cannot detect.

Unit tests (25-35%) cover complex business logic requiring exhaustive edge case testing. Contract tests (5-10%) ensure API compatibility between services. E2E tests (3-5%) verify critical user journeys only. Performance tests (2-3%) catch response time and throughput regressions.

Use mutation testing (PITest for Java, Stryker for JS/TS) to validate that your tests actually catch bugs, not just execute code. Target >80% mutation coverage for Java, >75% for JavaScript/TypeScript. Code coverage measures which lines run; mutation coverage proves tests detect defects.

Avoid anti-patterns: don't mock everything (use real dependencies via TestContainers), don't test implementation details (test behavior through public APIs), don't use brittle selectors (prefer semantic queries), and don't ignore flaky tests (fix or delete them).

Testing Strategy

Overview

Core Principles

Testing Honeycomb Model

Integration Tests (Core - 50-60%)

Unit Tests (25-35%)

Contract Tests (5-10%)

End-to-End Tests (3-5%)

Performance Tests (2-3%)

Mutation Tests (Quality Gate)

Test Distribution Target

Selecting the Right Test Type

Common Scenarios

Coverage Targets

Code Coverage

Mutation Coverage

Testing Anti-Patterns

Testing Implementation Details

Mocking Everything

Brittle Selectors

Ignoring Test Maintenance

Continuous Integration Testing

Quality Gates

Merge Request Requirements

Further Reading

Implementation Guides

Framework-Specific Guides

Summary

Overview​

Core Principles​

Testing Honeycomb Model​

Integration Tests (Core - 50-60%)​

Unit Tests (25-35%)​

Contract Tests (5-10%)​

End-to-End Tests (3-5%)​

Performance Tests (2-3%)​

Mutation Tests (Quality Gate)​

Test Distribution Target​

Selecting the Right Test Type​

Common Scenarios​

Coverage Targets​

Code Coverage​

Mutation Coverage​

Testing Anti-Patterns​

Testing Implementation Details​

Mocking Everything​

Brittle Selectors​

Ignoring Test Maintenance​

Continuous Integration Testing​

Quality Gates​

Merge Request Requirements​

Further Reading​

Implementation Guides​

Framework-Specific Guides​

Summary​

Overview

Core Principles

Testing Honeycomb Model

Integration Tests (Core - 50-60%)

Unit Tests (25-35%)

Contract Tests (5-10%)

End-to-End Tests (3-5%)

Performance Tests (2-3%)

Mutation Tests (Quality Gate)

Test Distribution Target

Selecting the Right Test Type

Common Scenarios

Coverage Targets

Code Coverage

Mutation Coverage

Testing Anti-Patterns

Testing Implementation Details

Mocking Everything

Brittle Selectors

Ignoring Test Maintenance

Continuous Integration Testing

Quality Gates

Merge Request Requirements

Further Reading

Implementation Guides

Framework-Specific Guides

Summary