Skip to main content

Logging Best Practices

Overview

Logging is one of the three pillars of observability (alongside metrics and distributed tracing). While metrics tell you what is happening in aggregate and tracing shows how requests flow through your system, logs capture detailed context about specific events, errors, and state changes.

This guide covers logging best practices including structured logging, log levels, correlation IDs, context propagation, and security considerations. We'll explore both Java (SLF4J/Logback) and TypeScript (Winston) implementations with practical examples.


Why Logging Matters

The Role of Logs in Production Systems

Logs serve multiple critical purposes:

  • Debugging: Investigate specific error conditions and unexpected behavior
  • Audit trails: Record who did what and when for compliance and security
  • Operational insight: Understand system behavior in production
  • Incident response: Diagnose and resolve production issues quickly
  • Correlation: Link related events across distributed systems using correlation IDs

Unlike metrics (which aggregate data) and traces (which track request flows), logs provide the detailed narrative of what happened at a specific point in time.


Logging Architecture

Before diving into specific practices, it's important to understand how logging works in modern applications:

Key components:

  • Logging Facade (SLF4J, Winston): Provides the API your application code uses. This abstraction allows you to change underlying implementations without modifying application code.
  • Logging Implementation (Logback, Log4j2): Handles the actual work of formatting, filtering, and routing log messages.
  • Appenders/Transports: Destination handlers that send logs to console, files, or centralized systems.
  • MDC (Mapped Diagnostic Context): Thread-local storage for contextual data (correlation IDs, user IDs) that automatically enriches all log entries.
  • Centralized Logging: Aggregation systems like ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk that collect, index, and analyze logs from all services.

Core Principles

  1. Structured Logging: Use JSON format for machine parsing rather than plain text
  2. Correlation IDs: Propagate trace identifiers across service boundaries for request tracking
  3. Appropriate Log Levels: Choose the right severity level (ERROR, WARN, INFO, DEBUG, TRACE)
  4. Never Log Sensitive Data: Protect PII, credentials, tokens, and financial data
  5. Contextual Information: Include relevant identifiers (user ID, request ID, transaction ID)
  6. Centralized Logging: Aggregate logs from all services for unified search and analysis
  7. Performance Awareness: Use lazy evaluation to avoid expensive operations when logging is disabled

Log Levels

Choosing the correct log level is critical for operational effectiveness. Too much INFO logging creates noise, while insufficient ERROR logging obscures problems. Understanding when to use each level requires thinking about who will act on the information and when.

Level Guidelines

LevelUsageWhen to UseWho Acts on It
ERRORSystem failures requiring immediate attentionUnrecoverable errors, exceptions that prevent core functionalityOn-call engineers, automated alerts
WARNPotential issues that don't stop operationDegraded performance, deprecated API usage, retry attempts, approaching limitsEngineers during incident investigation
INFOImportant business eventsSuccessful operations, state changes, configuration changesBusiness stakeholders, auditors, engineers
DEBUGDetailed troubleshooting informationVariable values, execution paths, SQL queriesEngineers actively debugging
TRACEVery detailed debuggingMethod entry/exit, loop iterations, detailed stateEngineers investigating complex issues

Production log level recommendations:

  • Production: INFO (DEBUG/TRACE disabled for performance)
  • Staging/UAT: DEBUG (enables detailed investigation)
  • Development: DEBUG or TRACE (full visibility)
Log Level in Production

Leaving DEBUG or TRACE enabled in production can:

  • Generate massive log volumes (storage costs, performance impact)
  • Inadvertently log sensitive data that was added during development
  • Make it harder to find important messages in the noise

Always configure production environments with INFO level, and use dynamic log level adjustment (e.g., Spring Boot Actuator) to temporarily enable DEBUG for specific loggers when troubleshooting.

Java (SLF4J) Examples

SLF4J (Simple Logging Facade for Java) is the standard logging API in Java applications. It provides an abstraction layer over actual logging implementations like Logback or Log4j2, allowing you to switch implementations without changing application code.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Service
public class PaymentService {
// Logger is static final to avoid creating multiple instances
// LoggerFactory.getLogger(Class) automatically sets logger name to fully-qualified class name
private static final Logger log = LoggerFactory.getLogger(PaymentService.class);

public PaymentResult processPayment(Payment payment) {
// INFO level: Business event that stakeholders care about
// Use parameterized logging {} placeholders to avoid string concatenation
// SLF4J only evaluates parameters if INFO level is enabled
log.info("Processing payment id={} amount={} currency={}",
payment.getId(), payment.getAmount(), payment.getCurrency());

try {
PaymentResult result = executePayment(payment);

// INFO: Successful completion of important business operation
// Include correlation IDs (transaction ID) for tracing
log.info("Payment processed successfully id={} transactionId={}",
payment.getId(), result.getTransactionId());

return result;

} catch (InsufficientBalanceException e) {
// WARN level: Expected business exception that doesn't indicate system failure
// Include relevant business context (amounts) to aid investigation
// Don't include stack trace - this is expected behavior
log.warn("Payment failed due to insufficient balance id={} required={} available={}",
payment.getId(), e.getRequiredAmount(), e.getAvailableAmount());
throw e;

} catch (Exception e) {
// ERROR level: Unexpected system error requiring investigation
// Include exception as last parameter for full stack trace
// This would trigger alerts in production monitoring
log.error("Payment processing failed id={}", payment.getId(), e);
throw new PaymentProcessingException("Failed to process payment", e);
}
}
}

Key points demonstrated:

  • Logger declaration: Static final logger created once per class
  • Parameterized logging: Use {} placeholders instead of string concatenation for performance
  • Level selection: INFO for business events, WARN for expected issues, ERROR for system failures
  • Exception logging: Include exception as final parameter to capture stack traces
  • Context: Always include relevant identifiers (payment ID, transaction ID) for correlation

TypeScript (Winston) Examples

Winston is the most popular logging library for Node.js/TypeScript applications. It provides flexible configuration, multiple transport options, and structured logging support.

import winston from 'winston';

// Configure Winston logger with JSON format for structured logging
const logger = winston.createLogger({
// Log level can be controlled via environment variable
level: process.env.LOG_LEVEL || 'info',

// JSON format enables machine parsing and indexing
format: winston.format.json(),

// Default metadata included in all log entries
defaultMeta: { service: 'payment-service' },

// Transports define where logs are sent
transports: [
// Separate file for errors makes them easy to monitor
new winston.transports.File({ filename: 'error.log', level: 'error' }),
// Combined log contains all levels
new winston.transports.File({ filename: 'combined.log' })
]
});

// Add console transport in non-production environments for visibility
if (process.env.NODE_ENV !== 'production') {
logger.add(new winston.transports.Console({
format: winston.format.simple()
}));
}

export class PaymentService {
async processPayment(payment: Payment): Promise<PaymentResult> {
// Structured logging: pass message + object with context
// Winston automatically merges this with defaultMeta
logger.info('Processing payment', {
paymentId: payment.id,
amount: payment.amount,
currency: payment.currency
});

try {
const result = await this.executePayment(payment);

logger.info('Payment processed successfully', {
paymentId: payment.id,
transactionId: result.transactionId
});

return result;

} catch (error) {
// Winston doesn't automatically extract stack traces
// Explicitly handle Error objects to capture full context
logger.error('Payment processing failed', {
paymentId: payment.id,
error: error instanceof Error ? error.message : 'Unknown error',
stack: error instanceof Error ? error.stack : undefined
});
throw error;
}
}
}

Key differences from SLF4J:

  • Configuration: Winston requires explicit logger configuration (transports, formats)
  • Structured logging: Pass objects rather than string templates
  • Error handling: Must explicitly extract error messages and stack traces
  • Transports: Flexible routing to multiple destinations (files, console, cloud services)

Structured Logging

Structured logging means emitting logs in a consistent, machine-readable format (typically JSON) rather than free-form text. This transformation is fundamental to modern observability.

Why structured logging matters:

  1. Machine Parsing: JSON logs can be automatically parsed, indexed, and searched in tools like Elasticsearch or Splunk
  2. Consistent Fields: Standardized field names (timestamp, level, message) enable queries across all services
  3. Rich Context: Complex objects can be included without string formatting
  4. Aggregation: Centralized logging systems can aggregate and analyze structured data
  5. Performance: Avoids expensive string concatenation and formatting in application code

Comparison:

# Plain text log (hard to parse)
2025-01-28 10:15:30 INFO PaymentService - Payment PAY-101 for $100.00 USD processed successfully in 245ms

# Structured log (easy to parse and query)
{"timestamp":"2025-01-28T10:15:30.123Z","level":"INFO","message":"Payment processed successfully","paymentId":"PAY-101","amount":100.00,"currency":"USD","duration":245}

With structured logs, you can easily query: "Find all payments over $10,000 that took longer than 1 second" - something nearly impossible with plain text logs.

JSON Format Configuration

Logback (the most common logging implementation for SLF4J) can be configured to output JSON using the Logstash encoder:

<!-- logback-spring.xml -->
<configuration>
<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<!-- LogstashEncoder converts log events to JSON format -->
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<!-- Include specific MDC keys in every log entry -->
<!-- MDC (Mapped Diagnostic Context) holds thread-local context data -->
<includeMdcKeyName>correlationId</includeMdcKeyName>
<includeMdcKeyName>userId</includeMdcKeyName>
<includeMdcKeyName>transactionId</includeMdcKeyName>

<!-- Optionally customize field names for compatibility with log aggregators -->
<fieldNames>
<timestamp>timestamp</timestamp>
<message>message</message>
<logger>logger</logger>
<level>level</level>
</fieldNames>
</encoder>
</appender>

<!-- Configure root logger level and appenders -->
<root level="INFO">
<appender-ref ref="JSON"/>
</root>
</configuration>

Configuration notes:

  • The LogstashEncoder dependency must be added to your project: net.logstash.logback:logstash-logback-encoder
  • MDC keys automatically appear in every log entry when present on the thread
  • Console appender works well in containerized environments (Docker/Kubernetes) where logs are captured from stdout

Example JSON Log Output

{
"timestamp": "2025-01-28T10:15:30.123Z",
"level": "INFO",
"logger": "com.bank.payment.PaymentService",
"message": "Payment processed successfully",
"correlationId": "abc-123-xyz",
"userId": "USER-456",
"transactionId": "TXN-789",
"paymentId": "PAY-101",
"amount": 100.00,
"currency": "USD",
"duration": 245,
"service": "payment-service",
"environment": "production",
"thread": "http-nio-8080-exec-1",
"mdc": {
"correlationId": "abc-123-xyz",
"userId": "USER-456",
"transactionId": "TXN-789"
}
}

Key fields explained:

  • timestamp: ISO 8601 format for precise time ordering and timezone handling
  • logger: Fully-qualified class name for filtering logs by component
  • correlationId: Links all log entries for a single request (see Correlation IDs below)
  • userId / transactionId: Business identifiers for tracing specific operations
  • service / environment: Deployment context for multi-service logging
  • thread: Java thread name, useful for diagnosing concurrency issues

Correlation IDs

In distributed systems, a single user request often spans multiple services. A correlation ID (also called trace ID or request ID) is a unique identifier that flows through all services involved in processing that request. This allows you to find all log entries related to a specific user action, even across service boundaries.

Why correlation IDs are essential:

Without correlation IDs, tracing a failed payment request requires searching logs in each service individually using timestamps and hoping they align. With correlation IDs, you search for "abc-123" across all services and get the complete story.

Correlation ID lifecycle:

  1. Generate: Create a new UUID when a request enters the system (or accept from client)
  2. Propagate: Pass the ID through all service calls via HTTP headers
  3. Log: Include the ID in every log entry using MDC (Java) or context (Node.js)
  4. Response: Return the ID to clients for support tickets ("Please provide request ID abc-123")

Understanding MDC (Mapped Diagnostic Context)

MDC is a thread-local map that holds contextual data (like correlation IDs) automatically available to all log statements on that thread. Think of it as a "magic backpack" that follows your code execution and enriches logs without explicitly passing data everywhere.

How MDC works:

  • Data stored in MDC exists only for the current thread
  • When you put a value in MDC, all subsequent log statements on that thread automatically include it
  • You must clean up MDC when the thread finishes to avoid leaking data to thread pool reuse
  • Web frameworks typically use filters to manage MDC lifecycle automatically

Spring Boot Implementation

Here's a complete implementation using a Servlet Filter to manage correlation IDs with MDC:

import org.slf4j.MDC;
import org.springframework.core.Ordered;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;

import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.util.UUID;

// CorrelationIdFilter.java
@Component
// Run this filter first to ensure correlation ID available for all subsequent filters
@Order(Ordered.HIGHEST_PRECEDENCE)
public class CorrelationIdFilter implements Filter {

private static final String CORRELATION_ID_HEADER = "X-Correlation-ID";
private static final String CORRELATION_ID_MDC_KEY = "correlationId";

@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {

HttpServletRequest httpRequest = (HttpServletRequest) request;
HttpServletResponse httpResponse = (HttpServletResponse) response;

// Check if client provided a correlation ID (for request tracing across clients)
String correlationId = httpRequest.getHeader(CORRELATION_ID_HEADER);

// If no ID provided, generate a new one (UUID ensures uniqueness)
if (correlationId == null || correlationId.isEmpty()) {
correlationId = UUID.randomUUID().toString();
}

// Store in MDC - now all log statements on this thread will include it
// The Logstash encoder (configured in logback-spring.xml) automatically
// includes MDC values in JSON output
MDC.put(CORRELATION_ID_MDC_KEY, correlationId);

// Return correlation ID in response headers
// Clients can use this for support tickets ("Request abc-123 failed")
httpResponse.setHeader(CORRELATION_ID_HEADER, correlationId);

try {
// Continue filter chain - all downstream filters and controllers
// will have access to correlation ID via MDC
chain.doFilter(request, response);
} finally {
// CRITICAL: Clean up MDC to prevent memory leaks
// Servlet containers use thread pools - without cleanup, the next
// request on this thread would have the wrong correlation ID
MDC.remove(CORRELATION_ID_MDC_KEY);
}
}
}

Implementation notes:

  • @Order(Ordered.HIGHEST_PRECEDENCE) ensures this filter runs before all others
  • The try/finally block guarantees MDC cleanup even if exceptions occur
  • UUID generation is fast enough (~1µs) that performance impact is negligible
  • Header name X-Correlation-ID is a common convention but can be customized

Using Correlation IDs in Service Code

Once the filter has placed the correlation ID in MDC, all logging automatically includes it:

@Service
public class PaymentService {
private static final Logger log = LoggerFactory.getLogger(PaymentService.class);

public PaymentResult processPayment(Payment payment) {
// Correlation ID automatically included from MDC in every log entry
// No need to pass it as parameter or include in message
log.info("Processing payment id={}", payment.getId());

// When calling another service, propagate correlation ID via HTTP header
// RestTemplate or WebClient interceptors can automatically add the header
PaymentResult result = externalService.process(payment);

log.info("Payment processed id={} transactionId={}",
payment.getId(), result.getTransactionId());

return result;
}
}

Propagating correlation IDs to downstream services:

When your service calls other services, you need to explicitly propagate the correlation ID via HTTP headers. Here's how to configure RestTemplate to do this automatically:

@Configuration
public class RestTemplateConfig {

@Bean
public RestTemplate restTemplate(RestTemplateBuilder builder) {
return builder
.interceptors((request, body, execution) -> {
// Retrieve correlation ID from MDC
String correlationId = MDC.get("correlationId");

// Add to outgoing request headers if present
if (correlationId != null) {
request.getHeaders().add("X-Correlation-ID", correlationId);
}

return execution.execute(request, body);
})
.build();
}
}

Now every HTTP call made with this RestTemplate will automatically include the correlation ID, allowing end-to-end tracing across services.


Audit Logging

Audit logs serve a different purpose than application logs. While application logs help engineers debug and monitor systems, audit logs provide a compliance trail showing who did what, when, and from where. Audit logs are often subject to regulatory requirements (SOX, GDPR, PCI-DSS, HIPAA) and must be retained for extended periods (often 7+ years).

Key differences between application and audit logs:

AspectApplication LogsAudit Logs
PurposeDebugging, monitoring, operationsCompliance, security, forensics
AudienceEngineers, DevOpsAuditors, security teams, legal
RetentionDays to monthsYears (often 7+)
ImmutabilityCan be rotated/deletedMust be tamper-proof
ContentTechnical detailsBusiness events, actor identity
VolumeHigh (can be verbose)Lower (only significant events)

What to audit:

  • Authentication events: Login, logout, failed attempts, password changes
  • Authorization events: Permission changes, role assignments, access denials
  • Data access: Viewing sensitive records (customer data, financial records)
  • Data modifications: Create, update, delete operations on important entities
  • Configuration changes: System settings, feature flags, admin actions
  • Financial transactions: Payments, transfers, account operations

Dedicated Audit Logger

Audit logs should be separated from application logs with their own logger, format, and retention policy:

@Component
public class AuditLogger {
private static final Logger auditLog = LoggerFactory.getLogger("AUDIT");

public void logPaymentProcessed(Payment payment, PaymentResult result) {
auditLog.info("event=PAYMENT_PROCESSED " +
"userId={} paymentId={} transactionId={} amount={} currency={} timestamp={}",
payment.getUserId(),
payment.getId(),
result.getTransactionId(),
payment.getAmount(),
payment.getCurrency(),
Instant.now());
}

public void logAccountAccess(String userId, String accountId, String action) {
auditLog.info("event=ACCOUNT_ACCESS " +
"userId={} accountId={} action={} timestamp={} ip={}",
userId,
accountId,
action,
Instant.now(),
getClientIp());
}

public void logAuthenticationFailure(String username, String reason) {
auditLog.warn("event=AUTH_FAILURE " +
"username={} reason={} timestamp={} ip={}",
username,
reason,
Instant.now(),
getClientIp());
}

private String getClientIp() {
// Get client IP from request context
return "0.0.0.0";
}
}

Audit Log Configuration

<!-- Separate audit log file -->
<configuration>
<appender name="AUDIT_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/audit.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/audit-%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>365</maxHistory> <!-- Keep for 1 year -->
</rollingPolicy>
<encoder>
<pattern>%d{ISO8601} %msg%n</pattern>
</encoder>
</appender>

<logger name="AUDIT" level="INFO" additivity="false">
<appender-ref ref="AUDIT_FILE"/>
</logger>
</configuration>

Security & Compliance

Never Log Sensitive Data

// BAD: Logging sensitive data
log.info("User login: username={} password={}", username, password);
log.info("Processing payment: cardNumber={} cvv={}", cardNumber, cvv);
log.info("Token: {}", jwtToken);

// GOOD: Sanitize or omit sensitive data
log.info("User login: username={}", username); // No password
log.info("Processing payment: cardLast4={}", cardNumber.substring(cardNumber.length() - 4));
log.info("Token received (length={})", jwtToken.length()); // No actual token

Data Masking

public class SensitiveDataMasker {

public static String maskCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.length() < 4) {
return "****";
}
return "****-****-****-" + cardNumber.substring(cardNumber.length() - 4);
}

public static String maskEmail(String email) {
if (email == null || !email.contains("@")) {
return "***@***.com";
}
String[] parts = email.split("@");
return parts[0].charAt(0) + "***@" + parts[1];
}

public static String maskAccountNumber(String accountNumber) {
if (accountNumber == null || accountNumber.length() < 4) {
return "****";
}
return "***" + accountNumber.substring(accountNumber.length() - 4);
}
}

// Usage
log.info("Payment processed for card {}",
SensitiveDataMasker.maskCardNumber(payment.getCardNumber()));

Performance Considerations

Logging can significantly impact application performance if not handled carefully. The main performance concerns are:

  1. String operations: Concatenation and formatting are expensive
  2. I/O operations: Writing to disk or network has latency
  3. Object serialization: Converting objects to strings (toString()) can be costly
  4. Volume: Excessive logging fills disks and overwhelms log aggregation systems

Lazy Logging and Parameterized Messages

SLF4J's parameterized logging delays string operations until after checking if the log level is enabled:

// BAD: String concatenation happens regardless of log level
// Even if DEBUG is disabled, the string concatenation and toString() calls execute
log.debug("Payment details: " + payment.toString() +
" with metadata: " + metadata.toString());
// Cost: O(n) string operations always executed

// GOOD: SLF4J parameterized logging (no concatenation if DEBUG disabled)
// SLF4J checks if DEBUG is enabled before evaluating parameters
log.debug("Payment details: {} with metadata: {}", payment, metadata);
// Cost: O(1) if DEBUG disabled, O(n) if enabled

// GOOD: Guard very expensive operations with explicit checks
// Use for operations more expensive than simple toString() calls
if (log.isDebugEnabled()) {
// This expensive method only runs if DEBUG logging is enabled
String diagnosticInfo = performExpensiveDiagnostics(payment);
log.debug("Diagnostic result: {}", diagnosticInfo);
}

Performance comparison:

  • String concatenation with + always executes (even when logging disabled): ~1000ns
  • Parameterized logging with {} when disabled: ~50ns (20x faster)
  • Parameterized logging with {} when enabled: ~1200ns (equivalent after evaluation)

Async Appenders

For high-throughput applications, consider using async appenders to move I/O off the request thread:

<!-- logback-spring.xml -->
<configuration>
<!-- Async appender wraps the actual appender -->
<appender name="ASYNC_FILE" class="ch.qos.logback.classic.AsyncAppender">
<!-- Queue size - increase for high throughput -->
<queueSize>512</queueSize>

<!-- Never block application threads - drop logs if queue full -->
<neverBlock>true</neverBlock>

<!-- Discard DEBUG and TRACE logs when queue 80% full -->
<discardingThreshold>0</discardingThreshold>

<!-- The actual file appender doing the I/O -->
<appender-ref ref="FILE"/>
</appender>

<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/application.log</file>
<encoder>
<pattern>%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>

<root level="INFO">
<appender-ref ref="ASYNC_FILE"/>
</root>
</configuration>

Async appender trade-offs:

  • Pros: Request threads don't block on I/O, better throughput
  • [BAD] Cons: Logs may be lost if app crashes before queue drains, added complexity

Centralized Logging

In microservices architectures, logs scattered across dozens of services become unmanageable. Centralized logging aggregates logs from all services into a single searchable system.

Why centralized logging is essential:

Without centralized logging, troubleshooting a single user request requires:

  1. SSH into each service's container/VM
  2. Search logs individually using grep
  3. Manually correlate timestamps across services
  4. Miss context when services have scaled down

With centralized logging:

  1. Search for correlation ID in Kibana
  2. See complete request flow across all services instantly
  3. Filter, aggregate, and visualize patterns
  4. Retain historical logs even after services scale down

ELK Stack Overview

The ELK Stack (Elasticsearch, Logstash, Kibana) is the most popular open-source centralized logging solution:

Components:

  • Elasticsearch: Distributed search and analytics engine that stores and indexes logs
  • Logstash: Data processing pipeline that ingests logs from multiple sources, transforms them, and sends to Elasticsearch
  • Kibana: Web interface for searching logs, creating visualizations, and building dashboards

Modern alternative: EFK Stack:

  • Replaces Logstash with Filebeat (lightweight log shipper) or Fluent Bit
  • Better performance for containerized environments (Kubernetes)
  • Lower resource consumption

ELK Stack Integration

Here's a basic ELK stack configuration suitable for development/testing:

# docker-compose.yml
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
# Single-node mode for development (use cluster mode in production)
- discovery.type=single-node
# Disable security for local development (enable in production)
- xpack.security.enabled=false
ports:
- "9200:9200" # REST API
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data

logstash:
image: docker.elastic.co/logstash/logstash:8.11.0
volumes:
# Mount Logstash configuration pipeline
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
ports:
- "5000:5000" # TCP input for log shipping
depends_on:
- elasticsearch

kibana:
image: docker.elastic.co/kibana/kibana:8.11.0
ports:
- "5601:5601" # Web UI
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch

volumes:
elasticsearch-data:

Logstash configuration (logstash.conf):

input {
# Accept JSON logs over TCP from applications
tcp {
port => 5000
codec => json_lines
}
}

filter {
# Parse JSON if not already parsed
if [message] =~ /^\{.*\}$/ {
json {
source => "message"
}
}

# Add processing timestamp
mutate {
add_field => { "[@metadata][processed_at]" => "%{@timestamp}" }
}
}

output {
# Send to Elasticsearch
elasticsearch {
hosts => ["elasticsearch:9200"]
# Index naming pattern: logs-YYYY.MM.DD
index => "logs-%{+YYYY.MM.dd}"
}

# Also output to stdout for debugging
stdout {
codec => rubydebug
}
}

Application configuration to ship logs:

Update logback-spring.xml to send logs to Logstash:

<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<destination>localhost:5000</destination>
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>

<root level="INFO">
<appender-ref ref="LOGSTASH"/>
</root>

Further Reading

Internal Documentation

External Resources


Summary

Effective logging requires understanding not just the mechanics of logging frameworks, but the strategic role logs play in observability. Logs provide detailed narrative context that complements metrics (aggregate statistics) and tracing (request flow visualization).

Key Takeaways

  1. Structured logging (JSON): Enables machine parsing, indexing, and powerful querying in centralized systems. Transform logs from human-readable text to machine-analyzable data.

  2. Log levels serve different audiences: ERROR alerts on-call engineers about failures, INFO records business events for stakeholders, DEBUG aids engineer troubleshooting. Choose levels based on who acts on the information.

  3. Correlation IDs enable distributed tracing: A single UUID flowing through all services turns scattered logs into a coherent request narrative. Implement via MDC (Java) or context (Node.js).

  4. MDC is thread-local magic: Mapped Diagnostic Context automatically enriches all log entries with contextual data (correlation ID, user ID) without passing values explicitly. Critical to remember cleanup in thread-pooled environments.

  5. Separate audit from application logs: Audit logs serve compliance (long retention, immutability) while application logs serve operations (shorter retention, verbosity). Different purposes require different configurations.

  6. Performance through lazy evaluation: SLF4J's parameterized logging ({} placeholders) delays expensive string operations until after checking log level. Use explicit guards (if (log.isDebugEnabled())) for very expensive operations.

  7. Never log sensitive data: Credentials, tokens, financial data, and PII must never appear in logs. Implement data masking for necessary identifiers (card numbers, account numbers).

  8. Centralized logging is essential for microservices: ELK Stack (or alternatives like Splunk, Datadog) aggregates logs from all services, enabling cross-service queries and retained history after services scale down.

  9. Async appenders for high throughput: Move I/O operations off request threads to improve latency, accepting trade-off of potential log loss on crashes.

  10. Logging framework abstractions matter: SLF4J (Java) and Winston (Node.js) provide facades that decouple application code from logging implementations, enabling flexibility without refactoring.

Relationship to Other Observability Pillars

Logs work best alongside the other pillars of observability:

  • Metrics: Use metrics to detect anomalies (error rate spike), then use logs to investigate specific errors
  • Tracing: Use trace IDs as correlation IDs in logs to link detailed log context with trace visualization
  • Combined power: Metrics alert you to problems, traces show you where in the request flow problems occur, and logs tell you why

Next Steps: