Logging Best Practices

Overview

Logging is one of the three pillars of observability (alongside metrics and distributed tracing). While metrics tell you what is happening in aggregate and tracing shows how requests flow through your system, logs capture detailed context about specific events, errors, and state changes.

This guide covers logging best practices including structured logging, log levels, correlation IDs, context propagation, and security considerations. We'll explore both Java (SLF4J/Logback) and TypeScript (Winston) implementations with practical examples.

Why Logging Matters

The Role of Logs in Production Systems

Logs serve multiple critical purposes:

Debugging: Investigate specific error conditions and unexpected behavior
Audit trails: Record who did what and when for compliance and security
Operational insight: Understand system behavior in production
Incident response: Diagnose and resolve production issues quickly
Correlation: Link related events across distributed systems using correlation IDs

Unlike metrics (which aggregate data) and traces (which track request flows), logs provide the detailed narrative of what happened at a specific point in time.

Logging Architecture

Before diving into specific practices, it's important to understand how logging works in modern applications:

Key components:

Logging Facade (SLF4J, Winston): Provides the API your application code uses. This abstraction allows you to change underlying implementations without modifying application code.
Logging Implementation (Logback, Log4j2): Handles the actual work of formatting, filtering, and routing log messages.
Appenders/Transports: Destination handlers that send logs to console, files, or centralized systems.
MDC (Mapped Diagnostic Context): Thread-local storage for contextual data (correlation IDs, user IDs) that automatically enriches all log entries.
Centralized Logging: Aggregation systems like ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk that collect, index, and analyze logs from all services.

Core Principles

Structured Logging: Use JSON format for machine parsing rather than plain text
Correlation IDs: Propagate trace identifiers across service boundaries for request tracking
Appropriate Log Levels: Choose the right severity level (ERROR, WARN, INFO, DEBUG, TRACE)
Never Log Sensitive Data: Protect PII, credentials, tokens, and financial data
Contextual Information: Include relevant identifiers (user ID, request ID, transaction ID)
Centralized Logging: Aggregate logs from all services for unified search and analysis
Performance Awareness: Use lazy evaluation to avoid expensive operations when logging is disabled

Log Levels

Choosing the correct log level is critical for operational effectiveness. Too much INFO logging creates noise, while insufficient ERROR logging obscures problems. Understanding when to use each level requires thinking about who will act on the information and when.

Level Guidelines

Level	Usage	When to Use	Who Acts on It
ERROR	System failures requiring immediate attention	Unrecoverable errors, exceptions that prevent core functionality	On-call engineers, automated alerts
WARN	Potential issues that don't stop operation	Degraded performance, deprecated API usage, retry attempts, approaching limits	Engineers during incident investigation
INFO	Important business events	Successful operations, state changes, configuration changes	Business stakeholders, auditors, engineers
DEBUG	Detailed troubleshooting information	Variable values, execution paths, SQL queries	Engineers actively debugging
TRACE	Very detailed debugging	Method entry/exit, loop iterations, detailed state	Engineers investigating complex issues

Production log level recommendations:

Production: INFO (DEBUG/TRACE disabled for performance)
Staging/UAT: DEBUG (enables detailed investigation)
Development: DEBUG or TRACE (full visibility)

Log Level in Production

Leaving DEBUG or TRACE enabled in production can:

Generate massive log volumes (storage costs, performance impact)
Inadvertently log sensitive data that was added during development
Make it harder to find important messages in the noise

Always configure production environments with INFO level, and use dynamic log level adjustment (e.g., Spring Boot Actuator) to temporarily enable DEBUG for specific loggers when troubleshooting.

Java (SLF4J) Examples

SLF4J (Simple Logging Facade for Java) is the standard logging API in Java applications. It provides an abstraction layer over actual logging implementations like Logback or Log4j2, allowing you to switch implementations without changing application code.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Service
public class PaymentService {
    // Logger is static final to avoid creating multiple instances
    // LoggerFactory.getLogger(Class) automatically sets logger name to fully-qualified class name
    private static final Logger log = LoggerFactory.getLogger(PaymentService.class);

    public PaymentResult processPayment(Payment payment) {
        // INFO level: Business event that stakeholders care about
        // Use parameterized logging {} placeholders to avoid string concatenation
        // SLF4J only evaluates parameters if INFO level is enabled
        log.info("Processing payment id={} amount={} currency={}",
            payment.getId(), payment.getAmount(), payment.getCurrency());

        try {
            PaymentResult result = executePayment(payment);

            // INFO: Successful completion of important business operation
            // Include correlation IDs (transaction ID) for tracing
            log.info("Payment processed successfully id={} transactionId={}",
                payment.getId(), result.getTransactionId());

            return result;

        } catch (InsufficientBalanceException e) {
            // WARN level: Expected business exception that doesn't indicate system failure
            // Include relevant business context (amounts) to aid investigation
            // Don't include stack trace - this is expected behavior
            log.warn("Payment failed due to insufficient balance id={} required={} available={}",
                payment.getId(), e.getRequiredAmount(), e.getAvailableAmount());
            throw e;

        } catch (Exception e) {
            // ERROR level: Unexpected system error requiring investigation
            // Include exception as last parameter for full stack trace
            // This would trigger alerts in production monitoring
            log.error("Payment processing failed id={}", payment.getId(), e);
            throw new PaymentProcessingException("Failed to process payment", e);
        }
    }
}

Key points demonstrated:

Logger declaration: Static final logger created once per class
Parameterized logging: Use {} placeholders instead of string concatenation for performance
Level selection: INFO for business events, WARN for expected issues, ERROR for system failures
Exception logging: Include exception as final parameter to capture stack traces
Context: Always include relevant identifiers (payment ID, transaction ID) for correlation

TypeScript (Winston) Examples

Winston is the most popular logging library for Node.js/TypeScript applications. It provides flexible configuration, multiple transport options, and structured logging support.

import winston from 'winston';

// Configure Winston logger with JSON format for structured logging
const logger = winston.createLogger({
  // Log level can be controlled via environment variable
  level: process.env.LOG_LEVEL || 'info',

  // JSON format enables machine parsing and indexing
  format: winston.format.json(),

  // Default metadata included in all log entries
  defaultMeta: { service: 'payment-service' },

  // Transports define where logs are sent
  transports: [
    // Separate file for errors makes them easy to monitor
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    // Combined log contains all levels
    new winston.transports.File({ filename: 'combined.log' })
  ]
});

// Add console transport in non-production environments for visibility
if (process.env.NODE_ENV !== 'production') {
  logger.add(new winston.transports.Console({
    format: winston.format.simple()
  }));
}

export class PaymentService {
  async processPayment(payment: Payment): Promise<PaymentResult> {
    // Structured logging: pass message + object with context
    // Winston automatically merges this with defaultMeta
    logger.info('Processing payment', {
      paymentId: payment.id,
      amount: payment.amount,
      currency: payment.currency
    });

    try {
      const result = await this.executePayment(payment);

      logger.info('Payment processed successfully', {
        paymentId: payment.id,
        transactionId: result.transactionId
      });

      return result;

    } catch (error) {
      // Winston doesn't automatically extract stack traces
      // Explicitly handle Error objects to capture full context
      logger.error('Payment processing failed', {
        paymentId: payment.id,
        error: error instanceof Error ? error.message : 'Unknown error',
        stack: error instanceof Error ? error.stack : undefined
      });
      throw error;
    }
  }
}

Key differences from SLF4J:

Configuration: Winston requires explicit logger configuration (transports, formats)
Structured logging: Pass objects rather than string templates
Error handling: Must explicitly extract error messages and stack traces
Transports: Flexible routing to multiple destinations (files, console, cloud services)

Structured Logging

Structured logging means emitting logs in a consistent, machine-readable format (typically JSON) rather than free-form text. This transformation is fundamental to modern observability.

Why structured logging matters:

Machine Parsing: JSON logs can be automatically parsed, indexed, and searched in tools like Elasticsearch or Splunk
Consistent Fields: Standardized field names (timestamp, level, message) enable queries across all services
Rich Context: Complex objects can be included without string formatting
Aggregation: Centralized logging systems can aggregate and analyze structured data
Performance: Avoids expensive string concatenation and formatting in application code

Comparison:

# Plain text log (hard to parse)
2025-01-28 10:15:30 INFO PaymentService - Payment PAY-101 for $100.00 USD processed successfully in 245ms

# Structured log (easy to parse and query)
{"timestamp":"2025-01-28T10:15:30.123Z","level":"INFO","message":"Payment processed successfully","paymentId":"PAY-101","amount":100.00,"currency":"USD","duration":245}

With structured logs, you can easily query: "Find all payments over $10,000 that took longer than 1 second" - something nearly impossible with plain text logs.

JSON Format Configuration

Logback (the most common logging implementation for SLF4J) can be configured to output JSON using the Logstash encoder:

<!-- logback-spring.xml -->
<configuration>
    <appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
        <!-- LogstashEncoder converts log events to JSON format -->
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <!-- Include specific MDC keys in every log entry -->
            <!-- MDC (Mapped Diagnostic Context) holds thread-local context data -->
            <includeMdcKeyName>correlationId</includeMdcKeyName>
            <includeMdcKeyName>userId</includeMdcKeyName>
            <includeMdcKeyName>transactionId</includeMdcKeyName>

            <!-- Optionally customize field names for compatibility with log aggregators -->
            <fieldNames>
                <timestamp>timestamp</timestamp>
                <message>message</message>
                <logger>logger</logger>
                <level>level</level>
            </fieldNames>
        </encoder>
    </appender>

    <!-- Configure root logger level and appenders -->
    <root level="INFO">
        <appender-ref ref="JSON"/>
    </root>
</configuration>

Configuration notes:

The LogstashEncoder dependency must be added to your project: net.logstash.logback:logstash-logback-encoder
MDC keys automatically appear in every log entry when present on the thread
Console appender works well in containerized environments (Docker/Kubernetes) where logs are captured from stdout

Example JSON Log Output

{
  "timestamp": "2025-01-28T10:15:30.123Z",
  "level": "INFO",
  "logger": "com.bank.payment.PaymentService",
  "message": "Payment processed successfully",
  "correlationId": "abc-123-xyz",
  "userId": "USER-456",
  "transactionId": "TXN-789",
  "paymentId": "PAY-101",
  "amount": 100.00,
  "currency": "USD",
  "duration": 245,
  "service": "payment-service",
  "environment": "production",
  "thread": "http-nio-8080-exec-1",
  "mdc": {
    "correlationId": "abc-123-xyz",
    "userId": "USER-456",
    "transactionId": "TXN-789"
  }
}

Key fields explained:

timestamp: ISO 8601 format for precise time ordering and timezone handling
logger: Fully-qualified class name for filtering logs by component
correlationId: Links all log entries for a single request (see Correlation IDs below)
userId / transactionId: Business identifiers for tracing specific operations
service / environment: Deployment context for multi-service logging
thread: Java thread name, useful for diagnosing concurrency issues

Correlation IDs

In distributed systems, a single user request often spans multiple services. A correlation ID (also called trace ID or request ID) is a unique identifier that flows through all services involved in processing that request. This allows you to find all log entries related to a specific user action, even across service boundaries.

Why correlation IDs are essential:

Without correlation IDs, tracing a failed payment request requires searching logs in each service individually using timestamps and hoping they align. With correlation IDs, you search for "abc-123" across all services and get the complete story.

Correlation ID lifecycle:

Generate: Create a new UUID when a request enters the system (or accept from client)
Propagate: Pass the ID through all service calls via HTTP headers
Log: Include the ID in every log entry using MDC (Java) or context (Node.js)
Response: Return the ID to clients for support tickets ("Please provide request ID abc-123")

Understanding MDC (Mapped Diagnostic Context)

MDC is a thread-local map that holds contextual data (like correlation IDs) automatically available to all log statements on that thread. Think of it as a "magic backpack" that follows your code execution and enriches logs without explicitly passing data everywhere.

How MDC works:

Data stored in MDC exists only for the current thread
When you put a value in MDC, all subsequent log statements on that thread automatically include it
You must clean up MDC when the thread finishes to avoid leaking data to thread pool reuse
Web frameworks typically use filters to manage MDC lifecycle automatically

Spring Boot Implementation

Here's a complete implementation using a Servlet Filter to manage correlation IDs with MDC:

import org.slf4j.MDC;
import org.springframework.core.Ordered;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;

import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.util.UUID;

// CorrelationIdFilter.java
@Component
// Run this filter first to ensure correlation ID available for all subsequent filters
@Order(Ordered.HIGHEST_PRECEDENCE)
public class CorrelationIdFilter implements Filter {

    private static final String CORRELATION_ID_HEADER = "X-Correlation-ID";
    private static final String CORRELATION_ID_MDC_KEY = "correlationId";

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
            throws IOException, ServletException {

        HttpServletRequest httpRequest = (HttpServletRequest) request;
        HttpServletResponse httpResponse = (HttpServletResponse) response;

        // Check if client provided a correlation ID (for request tracing across clients)
        String correlationId = httpRequest.getHeader(CORRELATION_ID_HEADER);

        // If no ID provided, generate a new one (UUID ensures uniqueness)
        if (correlationId == null || correlationId.isEmpty()) {
            correlationId = UUID.randomUUID().toString();
        }

        // Store in MDC - now all log statements on this thread will include it
        // The Logstash encoder (configured in logback-spring.xml) automatically
        // includes MDC values in JSON output
        MDC.put(CORRELATION_ID_MDC_KEY, correlationId);

        // Return correlation ID in response headers
        // Clients can use this for support tickets ("Request abc-123 failed")
        httpResponse.setHeader(CORRELATION_ID_HEADER, correlationId);

        try {
            // Continue filter chain - all downstream filters and controllers
            // will have access to correlation ID via MDC
            chain.doFilter(request, response);
        } finally {
            // CRITICAL: Clean up MDC to prevent memory leaks
            // Servlet containers use thread pools - without cleanup, the next
            // request on this thread would have the wrong correlation ID
            MDC.remove(CORRELATION_ID_MDC_KEY);
        }
    }
}

Implementation notes:

@Order(Ordered.HIGHEST_PRECEDENCE) ensures this filter runs before all others
The try/finally block guarantees MDC cleanup even if exceptions occur
UUID generation is fast enough (~1µs) that performance impact is negligible
Header name X-Correlation-ID is a common convention but can be customized

Using Correlation IDs in Service Code

Once the filter has placed the correlation ID in MDC, all logging automatically includes it:

@Service
public class PaymentService {
    private static final Logger log = LoggerFactory.getLogger(PaymentService.class);

    public PaymentResult processPayment(Payment payment) {
        // Correlation ID automatically included from MDC in every log entry
        // No need to pass it as parameter or include in message
        log.info("Processing payment id={}", payment.getId());

        // When calling another service, propagate correlation ID via HTTP header
        // RestTemplate or WebClient interceptors can automatically add the header
        PaymentResult result = externalService.process(payment);

        log.info("Payment processed id={} transactionId={}",
            payment.getId(), result.getTransactionId());

        return result;
    }
}

Propagating correlation IDs to downstream services:

When your service calls other services, you need to explicitly propagate the correlation ID via HTTP headers. Here's how to configure RestTemplate to do this automatically:

@Configuration
public class RestTemplateConfig {

    @Bean
    public RestTemplate restTemplate(RestTemplateBuilder builder) {
        return builder
            .interceptors((request, body, execution) -> {
                // Retrieve correlation ID from MDC
                String correlationId = MDC.get("correlationId");

                // Add to outgoing request headers if present
                if (correlationId != null) {
                    request.getHeaders().add("X-Correlation-ID", correlationId);
                }

                return execution.execute(request, body);
            })
            .build();
    }
}

Now every HTTP call made with this RestTemplate will automatically include the correlation ID, allowing end-to-end tracing across services.

Audit Logging

Audit logs serve a different purpose than application logs. While application logs help engineers debug and monitor systems, audit logs provide a compliance trail showing who did what, when, and from where. Audit logs are often subject to regulatory requirements (SOX, GDPR, PCI-DSS, HIPAA) and must be retained for extended periods (often 7+ years).

Key differences between application and audit logs:

Aspect	Application Logs	Audit Logs
Purpose	Debugging, monitoring, operations	Compliance, security, forensics
Audience	Engineers, DevOps	Auditors, security teams, legal
Retention	Days to months	Years (often 7+)
Immutability	Can be rotated/deleted	Must be tamper-proof
Content	Technical details	Business events, actor identity
Volume	High (can be verbose)	Lower (only significant events)

What to audit:

Authentication events: Login, logout, failed attempts, password changes
Authorization events: Permission changes, role assignments, access denials
Data access: Viewing sensitive records (customer data, financial records)
Data modifications: Create, update, delete operations on important entities
Configuration changes: System settings, feature flags, admin actions
Financial transactions: Payments, transfers, account operations

Dedicated Audit Logger

Audit logs should be separated from application logs with their own logger, format, and retention policy:

@Component
public class AuditLogger {
    private static final Logger auditLog = LoggerFactory.getLogger("AUDIT");

    public void logPaymentProcessed(Payment payment, PaymentResult result) {
        auditLog.info("event=PAYMENT_PROCESSED " +
            "userId={} paymentId={} transactionId={} amount={} currency={} timestamp={}",
            payment.getUserId(),
            payment.getId(),
            result.getTransactionId(),
            payment.getAmount(),
            payment.getCurrency(),
            Instant.now());
    }

    public void logAccountAccess(String userId, String accountId, String action) {
        auditLog.info("event=ACCOUNT_ACCESS " +
            "userId={} accountId={} action={} timestamp={} ip={}",
            userId,
            accountId,
            action,
            Instant.now(),
            getClientIp());
    }

    public void logAuthenticationFailure(String username, String reason) {
        auditLog.warn("event=AUTH_FAILURE " +
            "username={} reason={} timestamp={} ip={}",
            username,
            reason,
            Instant.now(),
            getClientIp());
    }

    private String getClientIp() {
        // Get client IP from request context
        return "0.0.0.0";
    }
}

Audit Log Configuration

<!-- Separate audit log file -->
<configuration>
    <appender name="AUDIT_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logs/audit.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>logs/audit-%d{yyyy-MM-dd}.log</fileNamePattern>
            <maxHistory>365</maxHistory> <!-- Keep for 1 year -->
        </rollingPolicy>
        <encoder>
            <pattern>%d{ISO8601} %msg%n</pattern>
        </encoder>
    </appender>

    <logger name="AUDIT" level="INFO" additivity="false">
        <appender-ref ref="AUDIT_FILE"/>
    </logger>
</configuration>

Security & Compliance

Never Log Sensitive Data

// BAD: Logging sensitive data
log.info("User login: username={} password={}", username, password);
log.info("Processing payment: cardNumber={} cvv={}", cardNumber, cvv);
log.info("Token: {}", jwtToken);

// GOOD: Sanitize or omit sensitive data
log.info("User login: username={}", username); // No password
log.info("Processing payment: cardLast4={}", cardNumber.substring(cardNumber.length() - 4));
log.info("Token received (length={})", jwtToken.length()); // No actual token

Data Masking

public class SensitiveDataMasker {

    public static String maskCardNumber(String cardNumber) {
        if (cardNumber == null || cardNumber.length() < 4) {
            return "****";
        }
        return "****-****-****-" + cardNumber.substring(cardNumber.length() - 4);
    }

    public static String maskEmail(String email) {
        if (email == null || !email.contains("@")) {
            return "***@***.com";
        }
        String[] parts = email.split("@");
        return parts[0].charAt(0) + "***@" + parts[1];
    }

    public static String maskAccountNumber(String accountNumber) {
        if (accountNumber == null || accountNumber.length() < 4) {
            return "****";
        }
        return "***" + accountNumber.substring(accountNumber.length() - 4);
    }
}

// Usage
log.info("Payment processed for card {}",
    SensitiveDataMasker.maskCardNumber(payment.getCardNumber()));

Performance Considerations

Logging can significantly impact application performance if not handled carefully. The main performance concerns are:

String operations: Concatenation and formatting are expensive
I/O operations: Writing to disk or network has latency
Object serialization: Converting objects to strings (toString()) can be costly
Volume: Excessive logging fills disks and overwhelms log aggregation systems

Lazy Logging and Parameterized Messages

SLF4J's parameterized logging delays string operations until after checking if the log level is enabled:

// BAD: String concatenation happens regardless of log level
// Even if DEBUG is disabled, the string concatenation and toString() calls execute
log.debug("Payment details: " + payment.toString() +
    " with metadata: " + metadata.toString());
// Cost: O(n) string operations always executed

// GOOD: SLF4J parameterized logging (no concatenation if DEBUG disabled)
// SLF4J checks if DEBUG is enabled before evaluating parameters
log.debug("Payment details: {} with metadata: {}", payment, metadata);
// Cost: O(1) if DEBUG disabled, O(n) if enabled

// GOOD: Guard very expensive operations with explicit checks
// Use for operations more expensive than simple toString() calls
if (log.isDebugEnabled()) {
    // This expensive method only runs if DEBUG logging is enabled
    String diagnosticInfo = performExpensiveDiagnostics(payment);
    log.debug("Diagnostic result: {}", diagnosticInfo);
}

Performance comparison:

String concatenation with + always executes (even when logging disabled): ~1000ns
Parameterized logging with {} when disabled: ~50ns (20x faster)
Parameterized logging with {} when enabled: ~1200ns (equivalent after evaluation)

Async Appenders

For high-throughput applications, consider using async appenders to move I/O off the request thread:

<!-- logback-spring.xml -->
<configuration>
    <!-- Async appender wraps the actual appender -->
    <appender name="ASYNC_FILE" class="ch.qos.logback.classic.AsyncAppender">
        <!-- Queue size - increase for high throughput -->
        <queueSize>512</queueSize>

        <!-- Never block application threads - drop logs if queue full -->
        <neverBlock>true</neverBlock>

        <!-- Discard DEBUG and TRACE logs when queue 80% full -->
        <discardingThreshold>0</discardingThreshold>

        <!-- The actual file appender doing the I/O -->
        <appender-ref ref="FILE"/>
    </appender>

    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logs/application.log</file>
        <encoder>
            <pattern>%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="ASYNC_FILE"/>
    </root>
</configuration>

Async appender trade-offs:

Pros: Request threads don't block on I/O, better throughput
[BAD] Cons: Logs may be lost if app crashes before queue drains, added complexity

Centralized Logging

In microservices architectures, logs scattered across dozens of services become unmanageable. Centralized logging aggregates logs from all services into a single searchable system.

Why centralized logging is essential:

Without centralized logging, troubleshooting a single user request requires:

SSH into each service's container/VM
Search logs individually using grep
Manually correlate timestamps across services
Miss context when services have scaled down

With centralized logging:

Search for correlation ID in Kibana
See complete request flow across all services instantly
Filter, aggregate, and visualize patterns
Retain historical logs even after services scale down

ELK Stack Overview

The ELK Stack (Elasticsearch, Logstash, Kibana) is the most popular open-source centralized logging solution:

Components:

Elasticsearch: Distributed search and analytics engine that stores and indexes logs
Logstash: Data processing pipeline that ingests logs from multiple sources, transforms them, and sends to Elasticsearch
Kibana: Web interface for searching logs, creating visualizations, and building dashboards

Modern alternative: EFK Stack:

Replaces Logstash with Filebeat (lightweight log shipper) or Fluent Bit
Better performance for containerized environments (Kubernetes)
Lower resource consumption

ELK Stack Integration

Here's a basic ELK stack configuration suitable for development/testing:

# docker-compose.yml
version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      # Single-node mode for development (use cluster mode in production)
      - discovery.type=single-node
      # Disable security for local development (enable in production)
      - xpack.security.enabled=false
    ports:
      - "9200:9200"  # REST API
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.0
    volumes:
      # Mount Logstash configuration pipeline
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    ports:
      - "5000:5000"  # TCP input for log shipping
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    ports:
      - "5601:5601"  # Web UI
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:

Logstash configuration (logstash.conf):

input {
  # Accept JSON logs over TCP from applications
  tcp {
    port => 5000
    codec => json_lines
  }
}

filter {
  # Parse JSON if not already parsed
  if [message] =~ /^\{.*\}$/ {
    json {
      source => "message"
    }
  }

  # Add processing timestamp
  mutate {
    add_field => { "[@metadata][processed_at]" => "%{@timestamp}" }
  }
}

output {
  # Send to Elasticsearch
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    # Index naming pattern: logs-YYYY.MM.DD
    index => "logs-%{+YYYY.MM.dd}"
  }

  # Also output to stdout for debugging
  stdout {
    codec => rubydebug
  }
}

Application configuration to ship logs:

Update logback-spring.xml to send logs to Logstash:

<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>localhost:5000</destination>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>

<root level="INFO">
    <appender-ref ref="LOGSTASH"/>
</root>

Summary

Effective logging requires understanding not just the mechanics of logging frameworks, but the strategic role logs play in observability. Logs provide detailed narrative context that complements metrics (aggregate statistics) and tracing (request flow visualization).

Key Takeaways

Structured logging (JSON): Enables machine parsing, indexing, and powerful querying in centralized systems. Transform logs from human-readable text to machine-analyzable data.
Log levels serve different audiences: ERROR alerts on-call engineers about failures, INFO records business events for stakeholders, DEBUG aids engineer troubleshooting. Choose levels based on who acts on the information.
Correlation IDs enable distributed tracing: A single UUID flowing through all services turns scattered logs into a coherent request narrative. Implement via MDC (Java) or context (Node.js).
MDC is thread-local magic: Mapped Diagnostic Context automatically enriches all log entries with contextual data (correlation ID, user ID) without passing values explicitly. Critical to remember cleanup in thread-pooled environments.
Separate audit from application logs: Audit logs serve compliance (long retention, immutability) while application logs serve operations (shorter retention, verbosity). Different purposes require different configurations.
Performance through lazy evaluation: SLF4J's parameterized logging ({} placeholders) delays expensive string operations until after checking log level. Use explicit guards (if (log.isDebugEnabled())) for very expensive operations.
Never log sensitive data: Credentials, tokens, financial data, and PII must never appear in logs. Implement data masking for necessary identifiers (card numbers, account numbers).
Centralized logging is essential for microservices: ELK Stack (or alternatives like Splunk, Datadog) aggregates logs from all services, enabling cross-service queries and retained history after services scale down.
Async appenders for high throughput: Move I/O operations off request threads to improve latency, accepting trade-off of potential log loss on crashes.
Logging framework abstractions matter: SLF4J (Java) and Winston (Node.js) provide facades that decouple application code from logging implementations, enabling flexibility without refactoring.

Relationship to Other Observability Pillars

Logs work best alongside the other pillars of observability:

Metrics: Use metrics to detect anomalies (error rate spike), then use logs to investigate specific errors
Tracing: Use trace IDs as correlation IDs in logs to link detailed log context with trace visualization
Combined power: Metrics alert you to problems, traces show you where in the request flow problems occur, and logs tell you why

Next Steps:

Review Observability Metrics to learn how metrics complement logs with aggregate statistics
Read Distributed Tracing to understand request flow visualization
See framework-specific implementations in Spring Boot Observability

Logging Best Practices

Overview

Why Logging Matters

Logging Architecture

Core Principles

Log Levels

Level Guidelines

Java (SLF4J) Examples

TypeScript (Winston) Examples

Structured Logging

JSON Format Configuration

Example JSON Log Output

Correlation IDs

Understanding MDC (Mapped Diagnostic Context)

Spring Boot Implementation

Using Correlation IDs in Service Code

Audit Logging

Dedicated Audit Logger

Audit Log Configuration

Security & Compliance

Never Log Sensitive Data

Data Masking

Performance Considerations

Lazy Logging and Parameterized Messages

Async Appenders

Centralized Logging

ELK Stack Overview

ELK Stack Integration

Further Reading

Internal Documentation

External Resources

Summary

Key Takeaways

Relationship to Other Observability Pillars

Overview​

Why Logging Matters​

Logging Architecture​

Core Principles​

Log Levels​

Level Guidelines​

Java (SLF4J) Examples​

TypeScript (Winston) Examples​

Structured Logging​

JSON Format Configuration​

Example JSON Log Output​

Correlation IDs​

Understanding MDC (Mapped Diagnostic Context)​

Spring Boot Implementation​

Using Correlation IDs in Service Code​

Audit Logging​

Dedicated Audit Logger​

Audit Log Configuration​

Security & Compliance​

Never Log Sensitive Data​

Data Masking​

Performance Considerations​

Lazy Logging and Parameterized Messages​

Async Appenders​

Centralized Logging​

ELK Stack Overview​

ELK Stack Integration​

Further Reading​

Internal Documentation​

External Resources​

Summary​

Key Takeaways​

Relationship to Other Observability Pillars​

Overview

Why Logging Matters

Logging Architecture

Core Principles

Log Levels

Level Guidelines

Java (SLF4J) Examples

TypeScript (Winston) Examples

Structured Logging

JSON Format Configuration

Example JSON Log Output

Correlation IDs

Understanding MDC (Mapped Diagnostic Context)

Spring Boot Implementation

Using Correlation IDs in Service Code

Audit Logging

Dedicated Audit Logger

Audit Log Configuration

Security & Compliance

Never Log Sensitive Data

Data Masking

Performance Considerations

Lazy Logging and Parameterized Messages

Async Appenders

Centralized Logging

ELK Stack Overview

ELK Stack Integration

Further Reading

Internal Documentation

External Resources

Summary

Key Takeaways

Relationship to Other Observability Pillars