Database Schema Design

Guidelines for designing robust, performant, and maintainable relational database schemas.

Overview

Database schema design is the foundation of data integrity and application performance. A well-designed schema enforces business rules through constraints, supports efficient queries through appropriate indexing, and scales as data volume grows. Poor schema design leads to data anomalies, slow queries, and expensive refactoring efforts.

This guide focuses on PostgreSQL, a powerful open-source relational database with full ACID compliance, but most principles apply to any relational database. For ORM-specific patterns, see Database ORM. For schema evolution, see Database Migrations.

Core Principles

Normalize to Eliminate Redundancy: Use normal forms to prevent update anomalies and data inconsistencies
Denormalize for Performance: Strategically duplicate data when read performance is critical
Enforce Constraints at Database Level: Use primary keys, foreign keys, check constraints, and unique constraints
Index for Query Patterns: Create indexes based on actual query access patterns, not assumptions
Choose Appropriate Data Types: Use the most specific type that fits your data (avoid generic TEXT for everything)
Plan for Audit Requirements: Include created/updated timestamps and user tracking from the start
Version Control Schema: Treat schema as code with migrations in version control

PostgreSQL: ACID Guarantees

PostgreSQL provides robust ACID guarantees essential for applications requiring data integrity:

ACID Explained:

Atomicity: Transactions are all-or-nothing operations. If any part of a transaction fails, the entire transaction rolls back, leaving the database unchanged. This prevents partial updates that would leave data in an inconsistent state.
Consistency: Transactions move the database from one valid state to another. All constraints (primary keys, foreign keys, check constraints) are enforced before a transaction commits. Invalid data cannot be persisted.
Isolation: Concurrent transactions execute as if they were serialized. Different isolation levels balance between performance and consistency. PostgreSQL supports Read Uncommitted, Read Committed (default), Repeatable Read, and Serializable isolation levels.
Durability: Once a transaction commits, the changes are permanent even if the system crashes immediately after. PostgreSQL uses Write-Ahead Logging (WAL) to ensure committed transactions are written to disk before acknowledging success.

For transaction management in Spring Boot applications, see Spring Boot Data Access.

Table Design

Basic Table Structure

Every table should follow these conventions:

CREATE TABLE orders (
    -- Primary key (UUID for distributed systems, BIGSERIAL for single instance)
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Business fields
    customer_id UUID NOT NULL,
    total_amount DECIMAL(19,4) NOT NULL CHECK (total_amount >= 0),
    status VARCHAR(20) NOT NULL,
    notes TEXT,

    -- Audit fields (essential for tracking changes)
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(100) NOT NULL,
    updated_by VARCHAR(100) NOT NULL,

    -- Optimistic locking
    version BIGINT NOT NULL DEFAULT 0
);

Key Design Decisions:

Primary Key Strategy: UUIDs prevent ID collisions in distributed systems but are larger (16 bytes vs 8 bytes for BIGINT). Use BIGSERIAL for single-database systems for better performance and smaller indexes.
Audit Fields: Always include created_at, updated_at, created_by, and updated_by. These fields are crucial for debugging, compliance, and understanding data history. They should be set automatically by triggers or ORM frameworks (see Database ORM).
Version Column: Enables optimistic locking to prevent lost updates when multiple users edit the same record concurrently. The application checks the version hasn't changed before committing updates.

Data Type Selection

Choose the most specific data type for your domain:

--  GOOD: Appropriate types
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    sku VARCHAR(50) NOT NULL,              -- Fixed max length for SKU
    name VARCHAR(200) NOT NULL,            -- Reasonable limit for product names
    description TEXT,                      -- Unlimited text for descriptions
    price DECIMAL(19,4) NOT NULL,          -- Precise decimal for money
    quantity INTEGER NOT NULL,             -- Whole numbers for quantities
    is_active BOOLEAN NOT NULL DEFAULT true,
    metadata JSONB,                        -- Structured data with indexing support
    tags TEXT[],                           -- Array for multiple values
    published_at TIMESTAMP WITH TIME ZONE  -- Store timezone for global apps
);

--  BAD: Generic types everywhere
CREATE TABLE products (
    id TEXT PRIMARY KEY,        -- Inefficient, no type safety
    sku TEXT,                   -- No length constraint
    name TEXT,                  -- No length constraint
    price FLOAT,                -- NEVER use FLOAT for money (rounding errors)
    quantity TEXT,              -- Should be INTEGER
    is_active TEXT,             -- Should be BOOLEAN
    metadata TEXT,              -- JSON as text loses indexing capability
    published_at TIMESTAMP      -- Missing timezone information
);

Decimal vs Float for Money: Floating-point numbers (FLOAT, DOUBLE PRECISION) cannot accurately represent decimal values like 0.1 in binary, leading to rounding errors. Always use DECIMAL or NUMERIC for monetary values where precision matters.

JSONB vs JSON: PostgreSQL offers both JSON (stores exact text) and JSONB (stores binary format). Use JSONB for better query performance and indexing capabilities. JSONB supports GIN indexes for efficient queries on nested fields.

Normalization

Normalization organizes data to eliminate redundancy and prevent update anomalies. Understanding normal forms helps you design schemas that maintain data integrity.

First Normal Form (1NF)

Rule: Each column contains atomic (indivisible) values, and each row is unique.

BAD: --  VIOLATES 1NF: Multiple values in single column
CREATE TABLE orders_bad (
    id UUID PRIMARY KEY,
    customer_name VARCHAR(200),
    product_ids TEXT  -- "prod-1,prod-2,prod-3" - comma-separated!
);

GOOD: --  SATISFIES 1NF: Atomic values, separate table for many-to-many
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    customer_id UUID NOT NULL
);

CREATE TABLE order_items (
    id UUID PRIMARY KEY,
    order_id UUID NOT NULL REFERENCES orders(id),
    product_id UUID NOT NULL REFERENCES products(id),
    quantity INTEGER NOT NULL
);

Second Normal Form (2NF)

Rule: Satisfies 1NF, and no non-key column depends on only part of the primary key (applies to composite keys).

BAD: --  VIOLATES 2NF: product_name depends only on product_id, not full key
CREATE TABLE order_items_bad (
    order_id UUID,
    product_id UUID,
    product_name VARCHAR(200),  -- Depends only on product_id
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

GOOD: --  SATISFIES 2NF: product_name moved to products table
CREATE TABLE order_items (
    order_id UUID,
    product_id UUID REFERENCES products(id),
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

CREATE TABLE products (
    id UUID PRIMARY KEY,
    name VARCHAR(200) NOT NULL
);

Third Normal Form (3NF)

Rule: Satisfies 2NF, and no non-key column depends on another non-key column (no transitive dependencies).

BAD: --  VIOLATES 3NF: country_name depends on country_code, not on customer_id
CREATE TABLE customers_bad (
    id UUID PRIMARY KEY,
    name VARCHAR(200),
    country_code CHAR(2),
    country_name VARCHAR(100)  -- Transitive dependency: id -> country_code -> country_name
);

GOOD: --  SATISFIES 3NF: country_name in separate table
CREATE TABLE customers (
    id UUID PRIMARY KEY,
    name VARCHAR(200),
    country_code CHAR(2) REFERENCES countries(code)
);

CREATE TABLE countries (
    code CHAR(2) PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

When to Denormalize: While normalization prevents data anomalies, sometimes controlled denormalization improves read performance. Common scenarios:

Materialized views: Pre-compute expensive joins for reporting
Caching computed values: Store aggregations (e.g., order total) to avoid recalculating
Read-heavy workloads: Duplicate data when reads vastly outnumber writes

Always denormalize intentionally with a clear understanding of the trade-offs (data consistency vs performance).

Constraints

Constraints enforce data integrity at the database level, preventing invalid data from being stored. They are your first line of defense against data corruption.

Primary Keys

-- UUID primary key (distributed systems)
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid()
);

-- BIGSERIAL primary key (single instance, better performance)
CREATE TABLE customers (
    id BIGSERIAL PRIMARY KEY
);

-- Composite primary key (many-to-many relationships)
CREATE TABLE order_items (
    order_id UUID NOT NULL,
    product_id UUID NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Foreign Keys

Foreign keys maintain referential integrity between tables. The ON DELETE and ON UPDATE clauses control cascading behavior:

CREATE TABLE order_items (
    id UUID PRIMARY KEY,
    order_id UUID NOT NULL,
    product_id UUID NOT NULL,

    -- RESTRICT: Prevent deletion if referenced (safe default)
    CONSTRAINT fk_order FOREIGN KEY (order_id)
        REFERENCES orders(id) ON DELETE RESTRICT,

    -- CASCADE: Delete child records when parent is deleted
    CONSTRAINT fk_product FOREIGN KEY (product_id)
        REFERENCES products(id) ON DELETE CASCADE,

    -- SET NULL: Set to null when parent is deleted
    -- SET DEFAULT: Set to default value when parent is deleted
    -- NO ACTION: Check constraint at end of transaction
);

Cascade Rules Explained:

RESTRICT: Prevents deletion of parent if children exist (safest option, prevents accidental data loss)
CASCADE: Automatically deletes children when parent is deleted (use carefully, can delete large amounts of data)
SET NULL: Sets foreign key to NULL when parent is deleted (requires column to be nullable)
SET DEFAULT: Sets foreign key to default value when parent is deleted
NO ACTION: Similar to RESTRICT but checks constraint at transaction end, not immediately

Check Constraints

CREATE TABLE products (
    id UUID PRIMARY KEY,
    price DECIMAL(19,4) NOT NULL CHECK (price >= 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    status VARCHAR(20) NOT NULL CHECK (status IN ('DRAFT', 'ACTIVE', 'ARCHIVED')),
    start_date DATE,
    end_date DATE,
    CONSTRAINT chk_date_range CHECK (end_date IS NULL OR end_date >= start_date)
);

Check constraints enforce business rules at the database level. Even if application code has bugs, invalid data cannot be inserted.

Unique Constraints

CREATE TABLE users (
    id UUID PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    username VARCHAR(50) NOT NULL,

    -- Single column unique
    CONSTRAINT uk_email UNIQUE (email),

    -- Composite unique (username is unique within account)
    account_id UUID NOT NULL,
    CONSTRAINT uk_account_username UNIQUE (account_id, username)
);

-- Partial unique index (conditional uniqueness)
CREATE UNIQUE INDEX uk_active_email ON users (email)
    WHERE deleted_at IS NULL;

Indexing Strategy

Indexes dramatically improve query performance by allowing the database to quickly locate rows without scanning the entire table. However, indexes have costs: they consume disk space and slow down writes (INSERT, UPDATE, DELETE) because indexes must be maintained.

When to Create Indexes

Create indexes for:

Primary keys: Automatically indexed
Foreign keys: Essential for join performance (not automatically indexed in PostgreSQL)
WHERE clause columns: Columns frequently used for filtering
ORDER BY columns: Columns used for sorting results
JOIN columns: Columns used in join conditions
Unique constraints: Enforced through unique indexes

Don't over-index:

Each index adds overhead to writes
Indexes consume disk space
PostgreSQL query planner may ignore indexes if table is small

Index Types

-- B-Tree (default, most common - supports equality and range queries)
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Partial index (smaller, faster for specific conditions)
CREATE INDEX idx_active_orders ON orders(customer_id)
    WHERE status = 'PENDING';

-- Composite index (order matters! Leftmost prefix rule)
CREATE INDEX idx_customer_status_date ON orders(customer_id, status, created_at DESC);
-- This index can be used for queries filtering on:
--   - customer_id alone
--   - customer_id + status
--   - customer_id + status + created_at
-- But NOT for status alone or created_at alone

-- GIN index (for full-text search and JSONB)
CREATE INDEX idx_product_metadata ON products USING GIN (metadata);

-- Hash index (equality only, smaller than B-tree)
CREATE INDEX idx_user_email_hash ON users USING HASH (email);

B-Tree Index: The default index type. B-tree (balanced tree) indexes support equality (=), comparison (<, >, <=, >=), range (BETWEEN), and pattern matching (LIKE 'prefix%'). They are sorted, making them efficient for ORDER BY queries.

GIN (Generalized Inverted Index): Designed for indexing composite values like arrays, JSONB documents, and full-text search. GIN indexes are larger but essential for queries on nested JSON fields or array contains operations.

Partial Index: Indexes only rows matching a WHERE condition. Smaller and faster than full indexes when you consistently query a subset of data (e.g., only active records).

Composite Index Leftmost Prefix Rule: PostgreSQL can use a composite index (a, b, c) for queries filtering on a, a + b, or a + b + c, but not for queries filtering only on b or c. The order of columns in a composite index matters.

Analyzing Index Usage

-- Find missing indexes (queries doing sequential scans)
SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;

-- Find unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;

-- Explain query execution plan
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 'some-uuid'
AND status = 'PENDING';

Use EXPLAIN ANALYZE to understand whether PostgreSQL is using your indexes. Look for "Index Scan" (good) vs "Seq Scan" (potentially slow for large tables). For query optimization patterns, see Spring Boot Data Access.

Enum Types and Status Fields

PostgreSQL Enum Types

-- Create enum type
CREATE TYPE order_status AS ENUM ('DRAFT', 'PENDING', 'PROCESSING', 'COMPLETED', 'CANCELLED');

CREATE TABLE orders (
    id UUID PRIMARY KEY,
    status order_status NOT NULL DEFAULT 'DRAFT'
);

-- Query using enum
SELECT * FROM orders WHERE status = 'PENDING';

Enum Trade-offs:

[GOOD] Type safety: Invalid values rejected at database level
[GOOD] Storage efficient: Stored as integers internally
[BAD] Difficult to modify: Adding/removing values requires schema changes
[BAD] Ordering matters: Enums are compared based on definition order

Alternative: Check Constraints

CREATE TABLE orders (
    id UUID PRIMARY KEY,
    status VARCHAR(20) NOT NULL CHECK (status IN ('DRAFT', 'PENDING', 'PROCESSING', 'COMPLETED', 'CANCELLED'))
);

Check Constraint Trade-offs:

[GOOD] Easier to modify: Change constraint without data migration
[GOOD] More flexible: Can easily add/remove values
[BAD] Less type safety: No custom type in queries
[BAD] Less efficient: Stored as strings

For ORM enum mapping patterns (JPA @Enumerated), see Database ORM.

Triggers and Generated Columns

Auto-Update Timestamp Trigger

CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ language 'plpgsql';

CREATE TRIGGER update_orders_updated_at
    BEFORE UPDATE ON orders
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

This trigger automatically sets updated_at whenever a row is modified, preventing stale timestamps from application bugs.

Generated Columns (PostgreSQL 12+)

CREATE TABLE products (
    id UUID PRIMARY KEY,
    price DECIMAL(19,4) NOT NULL,
    discount_percent INTEGER NOT NULL DEFAULT 0,

    -- Generated column (computed on read, not stored)
    discounted_price DECIMAL(19,4) GENERATED ALWAYS AS (
        price * (1 - discount_percent::DECIMAL / 100)
    ) STORED
);

Generated vs Stored Computed Values:

STORED: Computed on write, stored on disk, can be indexed
VIRTUAL (not supported by PostgreSQL): Computed on read, not stored

Summary

Key Takeaways:

ACID guarantees prevent data corruption and ensure consistency in concurrent environments
Normalize to 3NF to eliminate redundancy, denormalize intentionally for performance
Use constraints (primary keys, foreign keys, check constraints) to enforce data integrity at the database level
Index strategically based on actual query patterns, not assumptions
Choose appropriate data types: DECIMAL for money, JSONB for structured data, UUID or BIGSERIAL for primary keys
Always include audit fields: created_at, updated_at, created_by, updated_by, version
Foreign key cascade rules control what happens when parent records are deleted
Composite index order matters: Leftmost prefix rule determines which queries can use the index
Analyze index usage with EXPLAIN ANALYZE to ensure queries are efficient
Triggers automate common tasks like updating timestamps, but add complexity

Related Guides:

Database ORM - JPA/Hibernate entity mapping, repositories, transactions
Database Migrations - Flyway versioning and zero-downtime migrations
Spring Boot Data Access - Integration patterns
Spring Boot Testing - Database testing with TestContainers

Overview​

Core Principles​

PostgreSQL: ACID Guarantees​

Table Design​

Basic Table Structure​

Data Type Selection​

Normalization​

First Normal Form (1NF)​

Second Normal Form (2NF)​

Third Normal Form (3NF)​

Constraints​

Primary Keys​

Foreign Keys​

Check Constraints​

Unique Constraints​

Indexing Strategy​

When to Create Indexes​

Index Types​

Analyzing Index Usage​

Enum Types and Status Fields​

PostgreSQL Enum Types​

Alternative: Check Constraints​

Triggers and Generated Columns​

Auto-Update Timestamp Trigger​

Generated Columns (PostgreSQL 12+)​

Summary​