Spring Boot Data Access
Efficient and correct data access using Spring Data JPA, Hibernate, and PostgreSQL.
Overview
Data access is where many application performance and correctness issues originate. Slow queries, N+1 problems, transaction bugs, and schema migration failures cause production incidents. This guide covers proven patterns for reliable data access with Spring Data JPA.
For database schema design principles and normalization, see Data: Database Design.
Focus areas: Repository design, query optimization, transaction boundaries, schema migrations with Flyway, and avoiding common pitfalls.
Core Principles
- PostgreSQL in tests: Use TestContainers with real PostgreSQL, never H2
- Explicit transactions: Define transaction boundaries at service layer
- Avoid N+1 queries: Use JOIN FETCH or @EntityGraph
- DTOs for APIs: Never expose JPA entities directly
- Schema migrations: Version control schema changes with Flyway
Why PostgreSQL Over H2 in Tests
H2 is an in-memory database often used for testing. Don't use it.
The Problem: H2 and PostgreSQL have different SQL dialects:
-- Works in PostgreSQL, fails in H2
SELECT * FROM payments WHERE data->>'status' = 'pending'; -- JSON operator
-- Works in H2, fails in PostgreSQL
SELECT * FROM payments WHERE status = 'PENDING'; -- H2 case-insensitive by default
What happens:
- Tests pass with H2 (green checkmark)
- Deploy to production with PostgreSQL
- SQL syntax errors, constraint violations, or wrong results
- Production incident
Solution: Use TestContainers with real PostgreSQL:
dependencies {
testImplementation platform('org.testcontainers:testcontainers-bom:1.20.4')
testImplementation 'org.testcontainers:postgresql'
}
TestContainers spins up a Docker container with PostgreSQL for your tests. Container starts in 2-3 seconds, gets cleaned up automatically. Your tests now validate against the same database as production.
Dependencies
build.gradle:
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
implementation 'org.postgresql:postgresql'
implementation 'org.flywaydb:flyway-core'
implementation 'org.flywaydb:flyway-database-postgresql'
}
Configuration
application.yml:
spring:
datasource:
url: jdbc:postgresql://localhost:5432/payments
username: ${DB_USERNAME} # From environment, not hardcoded
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 10 # Max database connections
minimum-idle: 5 # Always keep 5 connections ready
connection-timeout: 30000 # Wait 30s for connection from pool
idle-timeout: 600000 # Close idle connections after 10min
max-lifetime: 1800000 # Recycle connections after 30min
jpa:
hibernate:
ddl-auto: validate # NEVER 'update' or 'create' in production
show-sql: false # Use logging instead (below)
properties:
hibernate:
# dialect: auto-detected from JDBC URL in Hibernate 6+ (Spring Boot 3.x)
jdbc:
batch_size: 20 # Batch inserts/updates
fetch_size: 50 # How many rows to fetch at once
order_inserts: true # Batch inserts for same entity
order_updates: true # Batch updates for same entity
format_sql: true # Pretty-print SQL in logs
open-in-view: false # CRITICAL: Disable to prevent lazy loading issues
flyway:
enabled: true
baseline-on-migrate: true
locations: classpath:db/migration
logging:
level:
org.hibernate.SQL: DEBUG # Log SQL statements
org.hibernate.orm.jdbc.bind: TRACE # Log query parameters
Key configurations explained:
-
ddl-auto: validate: Hibernate checks schema matches entities but doesn't modify database. Prevents accidental schema changes. Use Flyway for schema changes. -
open-in-view: false: By default, Spring Boot keeps Hibernate session open during HTTP request processing. This hides lazy loading issues (queries in controller/view). Disabling forces you to fetch all data in service layer where transactions belong. -
HikariCP pool settings: Connection pooling reuses database connections instead of creating new ones (expensive).
maximum-pool-size: 10means max 10 concurrent database operations. Tune based on load. -
Batch settings:
batch_size: 20groups multiple inserts/updates into single database round trip. If you save 100 entities, Hibernate sends 5 batches of 20 instead of 100 individual statements.
Entity Design
Basic Entity
@Entity
@Table(name = "payments")
@NoArgsConstructor(access = AccessLevel.PROTECTED) // JPA requires no-arg constructor
@AllArgsConstructor
@Builder
@Getter
@Setter
public class Payment {
@Id
@Column(length = 50)
private String id; // Use business-friendly IDs, not auto-increment
@Column(name = "customer_id", nullable = false, length = 50)
private String customerId;
@Column(nullable = false, precision = 19, scale = 4)
private BigDecimal amount; // NEVER use double/float for money
@Column(nullable = false, length = 3)
private String currency;
@Enumerated(EnumType.STRING) // Store as "PENDING", not 0
@Column(nullable = false, length = 20)
private PaymentStatus status;
@Column(name = "created_at", nullable = false, updatable = false)
private Instant createdAt;
@Column(name = "updated_at", nullable = false)
private Instant updatedAt;
@Version
private Long version; // Optimistic locking for concurrent updates
@PrePersist
protected void onCreate() {
if (id == null) {
id = "payment-" + UUID.randomUUID();
}
var now = Instant.now();
createdAt = now;
updatedAt = now;
}
@PreUpdate
protected void onUpdate() {
updatedAt = Instant.now();
}
}
Design decisions explained:
-
String ID instead of Long: UUIDs prevent ID enumeration attacks (user can't guess other IDs). Easier to merge data from different databases. Readable in logs.
-
BigDecimal for money:
doubleandfloathave rounding errors due to binary representation.100.00 - 99.99 = 0.010000000000005116with double. BigDecimal uses exact decimal arithmetic. -
EnumType.STRING: Stores "PENDING" instead of ordinal (0). If you reorder enum values, ordinal breaks. String is readable in database.
-
@Version for optimistic locking: Prevents lost updates in concurrent scenarios. If two transactions modify same entity, second one fails with
OptimisticLockingFailureException. Better than pessimistic locking (SELECT FOR UPDATE) which blocks other transactions. -
@PrePersist/@PreUpdate: Lifecycle callbacks automatically set timestamps. Better than manual
setCreatedAt()calls everywhere (easy to forget).
Relationships
@Entity
@Table(name = "customers")
public class Customer {
@Id
private String id;
@Column(nullable = false)
private String name;
// One customer has many payments
@OneToMany(
mappedBy = "customer", // Payment.customer field owns relationship
fetch = FetchType.LAZY, // Don't load payments when loading customer
cascade = CascadeType.ALL, // Save/delete payments with customer
orphanRemoval = true // Delete payments when removed from collection
)
private List<Payment> payments = new ArrayList<>();
}
@Entity
@Table(name = "payments")
public class Payment {
@Id
private String id;
// Many payments belong to one customer
@ManyToOne(fetch = FetchType.LAZY) // Don't load customer when loading payment
@JoinColumn(name = "customer_id", nullable = false)
private Customer customer;
}
Lazy vs Eager loading:
- LAZY: Related entities loaded only when accessed. Default for @OneToMany, @ManyToMany. Use this.
- EAGER: Related entities loaded immediately with parent. Causes N+1 queries. Avoid.
Why LAZY is better:
// EAGER loading
Customer customer = customerRepository.findById("cust-1");
// Hibernate executes:
// SELECT * FROM customers WHERE id = 'cust-1'
// SELECT * FROM payments WHERE customer_id = 'cust-1' ← Even if you don't need payments!
// LAZY loading
Customer customer = customerRepository.findById("cust-1");
// Hibernate executes:
// SELECT * FROM customers WHERE id = 'cust-1'
// Only this. Payments loaded when you call customer.getPayments()
Cascade types:
CascadeType.ALL: All operations (save, update, delete) cascade to childrenCascadeType.PERSIST: Only save cascadesCascadeType.REMOVE: Only delete cascadesorphanRemoval = true: Delete child when removed from parent's collection
Spring Data Repositories
Basic Repository
@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {
// Spring Data generates query from method name
List<Payment> findByCustomerId(String customerId);
// Multiple conditions
List<Payment> findByCustomerIdAndStatus(String customerId, PaymentStatus status);
// With pagination
Page<Payment> findByStatus(PaymentStatus status, Pageable pageable);
// Counting
long countByStatus(PaymentStatus status);
// Existence check
boolean existsByCustomerId(String customerId);
}
How method name queries work:
findBy+ field name =SELECT * FROM payments WHERE field = ?Andcombines conditions:WHERE customer_id = ? AND status = ?countByreturns count instead of entities- Spring Data parses method name and generates SQL automatically
The N+1 Query Problem
The Most Common Performance Bug
// Fetch all customers
List<Customer> customers = customerRepository.findAll();
// For each customer, print their payments
for (Customer customer : customers) {
System.out.println(customer.getPayments().size()); // LAZY load triggers query
}
What Hibernate executes:
SELECT * FROM customers; -- 1 query
SELECT * FROM payments WHERE customer_id = 'cust-1'; -- For customer 1
SELECT * FROM payments WHERE customer_id = 'cust-2'; -- For customer 2
SELECT * FROM payments WHERE customer_id = 'cust-3'; -- For customer 3
-- ... 1 query per customer
If you have 100 customers, this executes 101 queries (1 for customers + 100 for payments). This is called "N+1" (1 query + N queries for related data).
Why this kills performance:
- Each query has network overhead (database round trip)
- 100 queries = 100 network round trips
- At 5ms per query, 100 queries = 500ms
- All that data could have been fetched in 1 query taking 10ms
Solution 1: JOIN FETCH
@Repository
public interface CustomerRepository extends JpaRepository<Customer, String> {
@Query("SELECT c FROM Customer c JOIN FETCH c.payments WHERE c.id IN :ids")
List<Customer> findByIdsWithPayments(@Param("ids") List<String> ids);
}
What this does:
SELECT c.*, p.*
FROM customers c
LEFT JOIN payments p ON c.id = p.customer_id
WHERE c.id IN ('cust-1', 'cust-2', 'cust-3');
Single query fetches customers and all their payments. Hibernate assembles objects from result set.
Usage:
List<Customer> customers = customerRepository.findByIdsWithPayments(ids);
for (Customer customer : customers) {
System.out.println(customer.getPayments().size()); // No extra queries!
}
Solution 2: @EntityGraph
@Repository
public interface CustomerRepository extends JpaRepository<Customer, String> {
@EntityGraph(attributePaths = {"payments"}) // Eagerly fetch payments
List<Customer> findAll();
@EntityGraph(attributePaths = {"payments", "payments.items"}) // Nested paths
Customer findById(String id);
}
How it works: @EntityGraph tells Hibernate which associations to eagerly load for this specific query. Less verbose than JOIN FETCH but achieves same result.
Custom Queries
JPQL (Java Persistence Query Language)
@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {
@Query("""
SELECT p FROM Payment p
WHERE p.status = :status
AND p.createdAt BETWEEN :startDate AND :endDate
ORDER BY p.createdAt DESC
""")
List<Payment> findPaymentsInDateRange(
@Param("status") PaymentStatus status,
@Param("startDate") Instant startDate,
@Param("endDate") Instant endDate
);
}
JPQL vs SQL:
- JPQL queries entities:
SELECT p FROM Payment p(capital P = entity name) - SQL queries tables:
SELECT * FROM payments(lowercase = table name) - JPQL is database-agnostic (works with PostgreSQL, MySQL, Oracle)
- Hibernate translates JPQL to database-specific SQL
Projections (DTO queries)
When you only need a few fields, don't load entire entity:
// DTO record
public record PaymentSummary(
String id,
BigDecimal amount,
PaymentStatus status
) {}
// Repository
@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {
@Query("""
SELECT new com.bank.payments.dto.PaymentSummary(
p.id,
p.amount,
p.status
)
FROM Payment p
WHERE p.customerId = :customerId
""")
List<PaymentSummary> findSummariesByCustomer(@Param("customerId") String customerId);
}
Why projections matter:
- Fetches only needed columns from database (less data over network)
- Doesn't load unnecessary fields in memory
- Doesn't trigger lazy loading (DTOs aren't managed by Hibernate)
- For a list of 1000 payments, this can be 10x faster than loading full entities
Native SQL (Use Sparingly)
@Repository
public interface PaymentRepository extends JpaRepository<Payment, String> {
@Query(
value = """
SELECT * FROM payments
WHERE customer_id = :customerId
AND created_at >= NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
""",
nativeQuery = true
)
List<Payment> findRecentPayments(@Param("customerId") String customerId);
}
When to use native SQL:
- Database-specific features (PostgreSQL JSON operators, window functions)
- Performance-critical queries that need manual tuning
- Complex queries that JPQL can't express
Downsides:
- Tied to specific database (breaks if you switch PostgreSQL → MySQL)
- No compile-time validation (typos caught at runtime)
- Harder to test (requires real database)
Transactions
Service-Layer Transactions
ALWAYS define transaction boundaries at the service layer, not repository:
@Service
@Transactional(readOnly = true) // Default for all methods
@RequiredArgsConstructor
public class PaymentService {
private final PaymentRepository paymentRepository;
private final AuditRepository auditRepository;
// Read-only transaction (inherits from class level)
public Payment getPayment(String id) {
return paymentRepository.findById(id)
.orElseThrow(() -> new PaymentNotFoundException(id));
}
// Write transaction
@Transactional // Overrides readOnly = false
public Payment createPayment(PaymentRequest request) {
var payment = buildPayment(request);
payment = paymentRepository.save(payment);
// This runs in same transaction - if audit fails, payment rolls back
auditRepository.save(new AuditLog("payment.created", payment.getId()));
return payment;
}
}
Why @Transactional(readOnly = true) at class level:
- Tells database this transaction won't modify data
- Database can optimize (skip locks, use read replicas)
- Prevents accidental data modification in read methods
- Override with
@Transactionalon write methods
What happens in createPayment():
- Transaction begins
- Save payment to database (INSERT)
- Save audit log to database (INSERT)
- If audit fails, Hibernate rolls back payment insert
- Transaction commits (both INSERTs permanent) or rolls back (both discarded)
Transaction Propagation
@Service
@RequiredArgsConstructor
public class PaymentService {
private final PaymentRepository paymentRepository;
private final NotificationService notificationService;
@Transactional
public Payment createPayment(PaymentRequest request) {
var payment = paymentRepository.save(buildPayment(request));
// REQUIRED (default): Runs in same transaction
notificationService.sendEmail(payment); // If this fails, payment rolls back
return payment;
}
}
@Service
public class NotificationService {
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void sendEmail(Payment payment) {
// Runs in NEW transaction
// If this fails, payment still committed
emailClient.send(payment.getCustomerId(), "Payment received");
}
}
Propagation types:
REQUIRED(default): Use existing transaction or create new oneREQUIRES_NEW: Always create new transaction (suspend current)NESTED: Create nested transaction (savepoint)SUPPORTS: Use transaction if exists, otherwise non-transactional
When to use REQUIRES_NEW:
- Logging/auditing that must succeed even if main operation fails
- Side effects that shouldn't be rolled back (sending email, writing to external API)
Optimistic Locking
Prevents lost updates when multiple transactions modify same entity:
@Entity
public class Payment {
@Id
private String id;
@Version
private Long version; // Hibernate manages this field
private BigDecimal amount;
}
How it works:
-
Transaction 1 loads payment:
{id: "pay-1", amount: 100, version: 1} -
Transaction 2 loads same payment:
{id: "pay-1", amount: 100, version: 1} -
Transaction 1 updates amount to 150, Hibernate executes:
UPDATE payments
SET amount = 150, version = 2
WHERE id = 'pay-1' AND version = 1Success! Version bumped to 2.
-
Transaction 2 tries to update amount to 200:
UPDATE payments
SET amount = 200, version = 2
WHERE id = 'pay-1' AND version = 1 ← Still expects version 1No rows updated! Hibernate throws
OptimisticLockingFailureException.
Handling the exception:
@Service
@RequiredArgsConstructor
public class PaymentService {
private final PaymentRepository repository;
@Transactional
public Payment updateAmount(String id, BigDecimal newAmount) {
var payment = repository.findById(id)
.orElseThrow(() -> new PaymentNotFoundException(id));
payment.setAmount(newAmount);
try {
return repository.save(payment);
} catch (OptimisticLockingFailureException ex) {
// Someone else modified this payment
throw new ConcurrentUpdateException(
"Payment was modified by another user. Please refresh and try again."
);
}
}
}
Database Migrations with Flyway
Why Version Control Schema Changes
Manual schema changes cause problems:
- Developer forgets to run migration locally, code breaks
- Production deployment fails because migration not applied
- Can't reproduce production schema in test environment
Flyway tracks which migrations have run and applies pending ones automatically.
Migration File Structure
src/main/resources/db/migration/
├── V1__create_payments_table.sql
├── V2__add_transaction_id_column.sql
├── V3__create_index_on_customer_id.sql
└── V4__add_status_column.sql
Naming convention: V{version}__{description}.sql
- Version must be unique and increasing
- Double underscore separates version from description
- Description uses underscores, not spaces
Example Migrations
V1__create_payments_table.sql:
CREATE TABLE payments (
id VARCHAR(50) PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL,
amount DECIMAL(19, 4) NOT NULL,
currency VARCHAR(3) NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
version BIGINT NOT NULL DEFAULT 0
);
CREATE INDEX idx_payments_customer_id ON payments(customer_id);
CREATE INDEX idx_payments_status ON payments(status);
CREATE INDEX idx_payments_created_at ON payments(created_at);
V2__add_transaction_id_column.sql:
ALTER TABLE payments
ADD COLUMN transaction_id VARCHAR(100);
-- Index for lookups by transaction ID
CREATE INDEX idx_payments_transaction_id ON payments(transaction_id)
WHERE transaction_id IS NOT NULL; -- Partial index (PostgreSQL feature)
What happens on startup:
- Flyway checks
flyway_schema_historytable (tracks applied migrations) - Sees V1 applied, V2 not applied
- Executes V2__add_transaction_id_column.sql
- Records V2 as applied in
flyway_schema_history
Migration Best Practices
1. Never modify applied migrations
Once a migration runs in any environment, it's immutable. Flyway checksums each file. If you change it, Flyway fails with "migration checksum mismatch".
Bad:
-- V1__create_payments.sql (ALREADY APPLIED IN PROD)
CREATE TABLE payments (
id VARCHAR(50) PRIMARY KEY,
amount DECIMAL(10, 2) NOT NULL -- Oops, too small
);
-- Developer edits same file to fix
BAD: ALTER TABLE payments ALTER COLUMN amount TYPE DECIMAL(19, 4); -- WRONG
Good:
-- V1__create_payments.sql (don't touch)
CREATE TABLE payments (
id VARCHAR(50) PRIMARY KEY,
amount DECIMAL(10, 2) NOT NULL
);
-- V2__fix_amount_precision.sql (new migration)
GOOD: ALTER TABLE payments ALTER COLUMN amount TYPE DECIMAL(19, 4); -- CORRECT
2. Test migrations before committing
# Drop your local database
docker-compose down -v
# Start fresh database
docker-compose up -d
# Run application - Flyway applies migrations
./gradlew bootRun
# Verify schema matches expectations
3. Make migrations reversible (where possible)
Include undo scripts for rollback:
V3__add_status_column.sql:
ALTER TABLE payments ADD COLUMN status VARCHAR(20) DEFAULT 'PENDING';
U3__add_status_column.sql (undo - Flyway Teams only):
ALTER TABLE payments DROP COLUMN status;
Summary
Data Access Checklist:
- TestContainers with PostgreSQL: Never use H2 for integration tests
- Service-layer transactions:
@Transactionalon service methods, not repositories - readOnly = true: Default for read methods, enables database optimizations
- Avoid N+1 queries: Use JOIN FETCH or @EntityGraph
- DTOs for APIs: Never expose JPA entities in REST responses
- Optimistic locking: Use
@Versionfor concurrent updates - Flyway migrations: Version control all schema changes
- open-in-view: false: Disable to catch lazy loading bugs early
- BigDecimal for money: Never use double/float
- String IDs: Use UUIDs instead of auto-increment
Common Mistakes to Avoid:
- Using H2 for tests (SQL dialect differences from PostgreSQL)
- EAGER fetching (causes N+1 queries)
- Missing transaction boundaries (each repository call is separate transaction)
- Exposing entities in REST APIs (lazy loading exceptions, unwanted fields)
- Manual timestamp management (use @PrePersist/@PreUpdate)
- Modifying applied Flyway migrations (breaks checksum validation)
Cross-References:
- See Spring Boot Testing for TestContainers setup and integration testing
- See Database Design for schema design patterns and normalization
- See Database ORM for ORM patterns and best practices
- See Database Migrations for Flyway migration strategies
- See Database Code Review for data access review checklist
- See Performance Optimization for query tuning and indexing
- See Caching for caching strategies to reduce database load