Spring Boot Data Access

Efficient and correct data access using Spring Data JPA, Hibernate, and PostgreSQL.

Overview

Data access is where many application performance and correctness issues originate. Slow queries, N+1 problems, transaction bugs, and schema migration failures cause production incidents. This guide covers proven patterns for reliable data access with Spring Data JPA.

For database schema design principles and normalization, see Data: Database Design.

Focus areas: Repository design, query optimization, transaction boundaries, schema migrations with Flyway, and avoiding common pitfalls.

Core Principles

PostgreSQL in tests: Use TestContainers with real PostgreSQL, never H2
Explicit transactions: Define transaction boundaries at service layer
Avoid N+1 queries: Use JOIN FETCH or @EntityGraph
DTOs for APIs: Never expose JPA entities directly
Schema migrations: Version control schema changes with Flyway

Why PostgreSQL Over H2 in Tests

H2 is an in-memory database often used for testing. Don't use it.

The Problem: H2 and PostgreSQL have different SQL dialects:

-- Works in PostgreSQL, fails in H2
SELECT * FROM payments WHERE data->>'status' = 'pending';  -- JSON operator

-- Works in H2, fails in PostgreSQL
SELECT * FROM payments WHERE status = 'PENDING';  -- H2 case-insensitive by default

What happens:

Tests pass with H2 (green checkmark)
Deploy to production with PostgreSQL
SQL syntax errors, constraint violations, or wrong results
Production incident

Solution: Use TestContainers with real PostgreSQL:

dependencies {
    testImplementation platform('org.testcontainers:testcontainers-bom:1.20.4')
    testImplementation 'org.testcontainers:postgresql'
}

TestContainers spins up a Docker container with PostgreSQL for your tests. Container starts in 2-3 seconds, gets cleaned up automatically. Your tests now validate against the same database as production.

Dependencies

build.gradle:

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
    implementation 'org.postgresql:postgresql'
    implementation 'org.flywaydb:flyway-core'
    implementation 'org.flywaydb:flyway-database-postgresql'
}

Configuration

application.yml:

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/payments
    username: ${DB_USERNAME}  # From environment, not hardcoded
    password: ${DB_PASSWORD}
    hikari:
      maximum-pool-size: 10        # Max database connections
      minimum-idle: 5              # Always keep 5 connections ready
      connection-timeout: 30000    # Wait 30s for connection from pool
      idle-timeout: 600000         # Close idle connections after 10min
      max-lifetime: 1800000        # Recycle connections after 30min

  jpa:
    hibernate:
      ddl-auto: validate  # NEVER 'update' or 'create' in production
    show-sql: false       # Use logging instead (below)
    properties:
      hibernate:
        # dialect: auto-detected from JDBC URL in Hibernate 6+ (Spring Boot 3.x)
        jdbc:
          batch_size: 20           # Batch inserts/updates
          fetch_size: 50           # How many rows to fetch at once
        order_inserts: true        # Batch inserts for same entity
        order_updates: true        # Batch updates for same entity
        format_sql: true           # Pretty-print SQL in logs
    open-in-view: false   # CRITICAL: Disable to prevent lazy loading issues

  flyway:
    enabled: true
    baseline-on-migrate: true
    locations: classpath:db/migration

logging:
  level:
    org.hibernate.SQL: DEBUG                 # Log SQL statements
    org.hibernate.orm.jdbc.bind: TRACE       # Log query parameters

Key configurations explained:

ddl-auto: validate: Hibernate checks schema matches entities but doesn't modify database. Prevents accidental schema changes. Use Flyway for schema changes.
open-in-view: false: By default, Spring Boot keeps Hibernate session open during HTTP request processing. This hides lazy loading issues (queries in controller/view). Disabling forces you to fetch all data in service layer where transactions belong.
HikariCP pool settings: Connection pooling reuses database connections instead of creating new ones (expensive). maximum-pool-size: 10 means max 10 concurrent database operations. Tune based on load.
Batch settings: batch_size: 20 groups multiple inserts/updates into single database round trip. If you save 100 entities, Hibernate sends 5 batches of 20 instead of 100 individual statements.

Entity Design

Basic Entity

@Entity
@Table(name = "payments")
@NoArgsConstructor(access = AccessLevel.PROTECTED)  // JPA requires no-arg constructor
@AllArgsConstructor
@Builder
@Getter
@Setter
public class Payment {

    @Id
    @Column(length = 50)
    private String id;  // Use business-friendly IDs, not auto-increment

    @Column(name = "customer_id", nullable = false, length = 50)
    private String customerId;

    @Column(nullable = false, precision = 19, scale = 4)
    private BigDecimal amount;  // NEVER use double/float for money

    @Column(nullable = false, length = 3)
    private String currency;

    @Enumerated(EnumType.STRING)  // Store as "PENDING", not 0
    @Column(nullable = false, length = 20)
    private PaymentStatus status;

    @Column(name = "created_at", nullable = false, updatable = false)
    private Instant createdAt;

    @Column(name = "updated_at", nullable = false)
    private Instant updatedAt;

    @Version
    private Long version;  // Optimistic locking for concurrent updates

    @PrePersist
    protected void onCreate() {
        if (id == null) {
            id = "payment-" + UUID.randomUUID();
        }
        var now = Instant.now();
        createdAt = now;
        updatedAt = now;
    }

    @PreUpdate
    protected void onUpdate() {
        updatedAt = Instant.now();
    }
}

Design decisions explained:

String ID instead of Long: UUIDs prevent ID enumeration attacks (user can't guess other IDs). Easier to merge data from different databases. Readable in logs.
BigDecimal for money: double and float have rounding errors due to binary representation. 100.00 - 99.99 = 0.010000000000005116 with double. BigDecimal uses exact decimal arithmetic.
EnumType.STRING: Stores "PENDING" instead of ordinal (0). If you reorder enum values, ordinal breaks. String is readable in database.
@Version for optimistic locking: Prevents lost updates in concurrent scenarios. If two transactions modify same entity, second one fails with OptimisticLockingFailureException. Better than pessimistic locking (SELECT FOR UPDATE) which blocks other transactions.
@PrePersist/@PreUpdate: Lifecycle callbacks automatically set timestamps. Better than manual setCreatedAt() calls everywhere (easy to forget).

Relationships

@Entity
@Table(name = "customers")
public class Customer {

    @Id
    private String id;

    @Column(nullable = false)
    private String name;

    // One customer has many payments
    @OneToMany(
        mappedBy = "customer",        // Payment.customer field owns relationship
        fetch = FetchType.LAZY,       // Don't load payments when loading customer
        cascade = CascadeType.ALL,    // Save/delete payments with customer
        orphanRemoval = true          // Delete payments when removed from collection
    )
    private List<Payment> payments = new ArrayList<>();
}

@Entity
@Table(name = "payments")
public class Payment {

    @Id
    private String id;

    // Many payments belong to one customer
    @ManyToOne(fetch = FetchType.LAZY)  // Don't load customer when loading payment
    @JoinColumn(name = "customer_id", nullable = false)
    private Customer customer;
}

Lazy vs Eager loading:

LAZY: Related entities loaded only when accessed. Default for @OneToMany, @ManyToMany. Use this.
EAGER: Related entities loaded immediately with parent. Causes N+1 queries. Avoid.

Why LAZY is better:

// EAGER loading
Customer customer = customerRepository.findById("cust-1");
// Hibernate executes:
//   SELECT * FROM customers WHERE id = 'cust-1'
//   SELECT * FROM payments WHERE customer_id = 'cust-1'  ← Even if you don't need payments!

// LAZY loading
Customer customer = customerRepository.findById("cust-1");
// Hibernate executes:
//   SELECT * FROM customers WHERE id = 'cust-1'
// Only this. Payments loaded when you call customer.getPayments()

Cascade types:

CascadeType.ALL: All operations (save, update, delete) cascade to children
CascadeType.PERSIST: Only save cascades
CascadeType.REMOVE: Only delete cascades
orphanRemoval = true: Delete child when removed from parent's collection

Spring Data Repositories

Basic Repository

@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {

    // Spring Data generates query from method name
    List<Payment> findByCustomerId(String customerId);

    // Multiple conditions
    List<Payment> findByCustomerIdAndStatus(String customerId, PaymentStatus status);

    // With pagination
    Page<Payment> findByStatus(PaymentStatus status, Pageable pageable);

    // Counting
    long countByStatus(PaymentStatus status);

    // Existence check
    boolean existsByCustomerId(String customerId);
}

How method name queries work:

findBy + field name = SELECT * FROM payments WHERE field = ?
And combines conditions: WHERE customer_id = ? AND status = ?
countBy returns count instead of entities
Spring Data parses method name and generates SQL automatically

The N+1 Query Problem

The Most Common Performance Bug

// Fetch all customers
List<Customer> customers = customerRepository.findAll();

// For each customer, print their payments
for (Customer customer : customers) {
    System.out.println(customer.getPayments().size());  // LAZY load triggers query
}

What Hibernate executes:

SELECT * FROM customers;                    -- 1 query
SELECT * FROM payments WHERE customer_id = 'cust-1';  -- For customer 1
SELECT * FROM payments WHERE customer_id = 'cust-2';  -- For customer 2
SELECT * FROM payments WHERE customer_id = 'cust-3';  -- For customer 3
-- ... 1 query per customer

If you have 100 customers, this executes 101 queries (1 for customers + 100 for payments). This is called "N+1" (1 query + N queries for related data).

Why this kills performance:

Each query has network overhead (database round trip)
100 queries = 100 network round trips
At 5ms per query, 100 queries = 500ms
All that data could have been fetched in 1 query taking 10ms

Solution 1: JOIN FETCH

@Repository
public interface CustomerRepository extends JpaRepository<Customer, String> {

    @Query("SELECT c FROM Customer c JOIN FETCH c.payments WHERE c.id IN :ids")
    List<Customer> findByIdsWithPayments(@Param("ids") List<String> ids);
}

What this does:

SELECT c.*, p.*
FROM customers c
LEFT JOIN payments p ON c.id = p.customer_id
WHERE c.id IN ('cust-1', 'cust-2', 'cust-3');

Single query fetches customers and all their payments. Hibernate assembles objects from result set.

Usage:

List<Customer> customers = customerRepository.findByIdsWithPayments(ids);
for (Customer customer : customers) {
    System.out.println(customer.getPayments().size());  // No extra queries!
}

Solution 2: @EntityGraph

@Repository
public interface CustomerRepository extends JpaRepository<Customer, String> {

    @EntityGraph(attributePaths = {"payments"})  // Eagerly fetch payments
    List<Customer> findAll();

    @EntityGraph(attributePaths = {"payments", "payments.items"})  // Nested paths
    Customer findById(String id);
}

How it works: @EntityGraph tells Hibernate which associations to eagerly load for this specific query. Less verbose than JOIN FETCH but achieves same result.

Custom Queries

JPQL (Java Persistence Query Language)

@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {

    @Query("""
        SELECT p FROM Payment p
        WHERE p.status = :status
        AND p.createdAt BETWEEN :startDate AND :endDate
        ORDER BY p.createdAt DESC
        """)
    List<Payment> findPaymentsInDateRange(
        @Param("status") PaymentStatus status,
        @Param("startDate") Instant startDate,
        @Param("endDate") Instant endDate
    );
}

JPQL vs SQL:

JPQL queries entities: SELECT p FROM Payment p (capital P = entity name)
SQL queries tables: SELECT * FROM payments (lowercase = table name)
JPQL is database-agnostic (works with PostgreSQL, MySQL, Oracle)
Hibernate translates JPQL to database-specific SQL

Projections (DTO queries)

When you only need a few fields, don't load entire entity:

// DTO record
public record PaymentSummary(
    String id,
    BigDecimal amount,
    PaymentStatus status
) {}

// Repository
@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {

    @Query("""
        SELECT new com.bank.payments.dto.PaymentSummary(
            p.id,
            p.amount,
            p.status
        )
        FROM Payment p
        WHERE p.customerId = :customerId
        """)
    List<PaymentSummary> findSummariesByCustomer(@Param("customerId") String customerId);
}

Why projections matter:

Fetches only needed columns from database (less data over network)
Doesn't load unnecessary fields in memory
Doesn't trigger lazy loading (DTOs aren't managed by Hibernate)
For a list of 1000 payments, this can be 10x faster than loading full entities

Native SQL (Use Sparingly)

@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {

    @Query(
        value = """
            SELECT * FROM payments
            WHERE customer_id = :customerId
            AND created_at >= NOW() - INTERVAL '30 days'
            ORDER BY created_at DESC
            """,
        nativeQuery = true
    )
    List<Payment> findRecentPayments(@Param("customerId") String customerId);
}

When to use native SQL:

Database-specific features (PostgreSQL JSON operators, window functions)
Performance-critical queries that need manual tuning
Complex queries that JPQL can't express

Downsides:

Tied to specific database (breaks if you switch PostgreSQL → MySQL)
No compile-time validation (typos caught at runtime)
Harder to test (requires real database)

Transactions

Service-Layer Transactions

ALWAYS define transaction boundaries at the service layer, not repository:

@Service
@Transactional(readOnly = true)  // Default for all methods
@RequiredArgsConstructor
public class PaymentService {

    private final PaymentRepository paymentRepository;
    private final AuditRepository auditRepository;

    // Read-only transaction (inherits from class level)
    public Payment getPayment(String id) {
        return paymentRepository.findById(id)
            .orElseThrow(() -> new PaymentNotFoundException(id));
    }

    // Write transaction
    @Transactional  // Overrides readOnly = false
    public Payment createPayment(PaymentRequest request) {
        var payment = buildPayment(request);
        payment = paymentRepository.save(payment);

        // This runs in same transaction - if audit fails, payment rolls back
        auditRepository.save(new AuditLog("payment.created", payment.getId()));

        return payment;
    }
}

Why @Transactional(readOnly = true) at class level:

Tells database this transaction won't modify data
Database can optimize (skip locks, use read replicas)
Prevents accidental data modification in read methods
Override with @Transactional on write methods

What happens in createPayment():

Transaction begins
Save payment to database (INSERT)
Save audit log to database (INSERT)
If audit fails, Hibernate rolls back payment insert
Transaction commits (both INSERTs permanent) or rolls back (both discarded)

Transaction Propagation

@Service
@RequiredArgsConstructor
public class PaymentService {

    private final PaymentRepository paymentRepository;
    private final NotificationService notificationService;

    @Transactional
    public Payment createPayment(PaymentRequest request) {
        var payment = paymentRepository.save(buildPayment(request));

        // REQUIRED (default): Runs in same transaction
        notificationService.sendEmail(payment);  // If this fails, payment rolls back

        return payment;
    }
}

@Service
public class NotificationService {

    @Transactional(propagation = Propagation.REQUIRES_NEW)
    public void sendEmail(Payment payment) {
        // Runs in NEW transaction
        // If this fails, payment still committed
        emailClient.send(payment.getCustomerId(), "Payment received");
    }
}

Propagation types:

REQUIRED (default): Use existing transaction or create new one
REQUIRES_NEW: Always create new transaction (suspend current)
NESTED: Create nested transaction (savepoint)
SUPPORTS: Use transaction if exists, otherwise non-transactional

When to use REQUIRES_NEW:

Logging/auditing that must succeed even if main operation fails
Side effects that shouldn't be rolled back (sending email, writing to external API)

Optimistic Locking

Prevents lost updates when multiple transactions modify same entity:

@Entity
public class Payment {
    @Id
    private String id;

    @Version
    private Long version;  // Hibernate manages this field

    private BigDecimal amount;
}

How it works:

Transaction 1 loads payment: {id: "pay-1", amount: 100, version: 1}
Transaction 2 loads same payment: {id: "pay-1", amount: 100, version: 1}

Transaction 1 updates amount to 150, Hibernate executes:

UPDATE payments
SET amount = 150, version = 2
WHERE id = 'pay-1' AND version = 1

Success! Version bumped to 2.

Transaction 2 tries to update amount to 200:

UPDATE payments
SET amount = 200, version = 2
WHERE id = 'pay-1' AND version = 1  ← Still expects version 1

No rows updated! Hibernate throws OptimisticLockingFailureException.

Handling the exception:

@Service
@RequiredArgsConstructor
public class PaymentService {

    private final PaymentRepository repository;

    @Transactional
    public Payment updateAmount(String id, BigDecimal newAmount) {
        var payment = repository.findById(id)
            .orElseThrow(() -> new PaymentNotFoundException(id));

        payment.setAmount(newAmount);

        try {
            return repository.save(payment);
        } catch (OptimisticLockingFailureException ex) {
            // Someone else modified this payment
            throw new ConcurrentUpdateException(
                "Payment was modified by another user. Please refresh and try again."
            );
        }
    }
}

Database Migrations with Flyway

Why Version Control Schema Changes

Manual schema changes cause problems:

Developer forgets to run migration locally, code breaks
Production deployment fails because migration not applied
Can't reproduce production schema in test environment

Flyway tracks which migrations have run and applies pending ones automatically.

Migration File Structure

src/main/resources/db/migration/
├── V1__create_payments_table.sql
├── V2__add_transaction_id_column.sql
├── V3__create_index_on_customer_id.sql
└── V4__add_status_column.sql

Naming convention: V{version}__{description}.sql

Version must be unique and increasing
Double underscore separates version from description
Description uses underscores, not spaces

Example Migrations

V1__create_payments_table.sql:

CREATE TABLE payments (
    id VARCHAR(50) PRIMARY KEY,
    customer_id VARCHAR(50) NOT NULL,
    amount DECIMAL(19, 4) NOT NULL,
    currency VARCHAR(3) NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL,
    version BIGINT NOT NULL DEFAULT 0
);

CREATE INDEX idx_payments_customer_id ON payments(customer_id);
CREATE INDEX idx_payments_status ON payments(status);
CREATE INDEX idx_payments_created_at ON payments(created_at);

V2__add_transaction_id_column.sql:

ALTER TABLE payments
ADD COLUMN transaction_id VARCHAR(100);

-- Index for lookups by transaction ID
CREATE INDEX idx_payments_transaction_id ON payments(transaction_id)
WHERE transaction_id IS NOT NULL;  -- Partial index (PostgreSQL feature)

What happens on startup:

Flyway checks flyway_schema_history table (tracks applied migrations)
Sees V1 applied, V2 not applied
Executes V2__add_transaction_id_column.sql
Records V2 as applied in flyway_schema_history

Migration Best Practices

1. Never modify applied migrations

Once a migration runs in any environment, it's immutable. Flyway checksums each file. If you change it, Flyway fails with "migration checksum mismatch".

Bad:

-- V1__create_payments.sql (ALREADY APPLIED IN PROD)
CREATE TABLE payments (
    id VARCHAR(50) PRIMARY KEY,
    amount DECIMAL(10, 2) NOT NULL  -- Oops, too small
);

-- Developer edits same file to fix
BAD: ALTER TABLE payments ALTER COLUMN amount TYPE DECIMAL(19, 4);  --  WRONG

Good:

-- V1__create_payments.sql (don't touch)
CREATE TABLE payments (
    id VARCHAR(50) PRIMARY KEY,
    amount DECIMAL(10, 2) NOT NULL
);

-- V2__fix_amount_precision.sql (new migration)
GOOD: ALTER TABLE payments ALTER COLUMN amount TYPE DECIMAL(19, 4);  --  CORRECT

2. Test migrations before committing

# Drop your local database
docker-compose down -v

# Start fresh database
docker-compose up -d

# Run application - Flyway applies migrations
./gradlew bootRun

# Verify schema matches expectations

3. Make migrations reversible (where possible)

Include undo scripts for rollback:

V3__add_status_column.sql:

ALTER TABLE payments ADD COLUMN status VARCHAR(20) DEFAULT 'PENDING';

U3__add_status_column.sql (undo - Flyway Teams only):

ALTER TABLE payments DROP COLUMN status;

Summary

Data Access Checklist:

TestContainers with PostgreSQL: Never use H2 for integration tests
Service-layer transactions: @Transactional on service methods, not repositories
readOnly = true: Default for read methods, enables database optimizations
Avoid N+1 queries: Use JOIN FETCH or @EntityGraph
DTOs for APIs: Never expose JPA entities in REST responses
Optimistic locking: Use @Version for concurrent updates
Flyway migrations: Version control all schema changes
open-in-view: false: Disable to catch lazy loading bugs early
BigDecimal for money: Never use double/float
String IDs: Use UUIDs instead of auto-increment

Common Mistakes to Avoid:

Using H2 for tests (SQL dialect differences from PostgreSQL)
EAGER fetching (causes N+1 queries)
Missing transaction boundaries (each repository call is separate transaction)
Exposing entities in REST APIs (lazy loading exceptions, unwanted fields)
Manual timestamp management (use @PrePersist/@PreUpdate)
Modifying applied Flyway migrations (breaks checksum validation)

Cross-References:

See Spring Boot Testing for TestContainers setup and integration testing
See Database Design for schema design patterns and normalization
See Database ORM for ORM patterns and best practices
See Database Migrations for Flyway migration strategies
See Database Code Review for data access review checklist
See Performance Optimization for query tuning and indexing
See Caching for caching strategies to reduce database load

Overview​

Core Principles​

Why PostgreSQL Over H2 in Tests​

Dependencies​

Configuration​

Entity Design​

Basic Entity​

Relationships​

Spring Data Repositories​

Basic Repository​

The N+1 Query Problem​

Solution 1: JOIN FETCH​

Solution 2: @EntityGraph​

Custom Queries​

JPQL (Java Persistence Query Language)​

Projections (DTO queries)​

Native SQL (Use Sparingly)​

Transactions​

Service-Layer Transactions​

Transaction Propagation​

Optimistic Locking​

Database Migrations with Flyway​

Why Version Control Schema Changes​

Migration File Structure​

Example Migrations​

Migration Best Practices​

Summary​