Debugging Strategies

Overview

Debugging is the systematic process of identifying, isolating, and fixing defects in software. While writing code is creative, debugging is analytical - requiring methodical investigation, hypothesis testing, and logical reasoning. Effective debugging separates experienced engineers from novices: experts debug faster not because they write fewer bugs, but because they've developed systematic approaches to finding and fixing them.

Debugging occurs across different contexts. During development, you debug with full control: restart the application, modify code, add logging freely. In production, constraints multiply: limited access, no restarts without approval, incomplete information. The core techniques remain similar, but production debugging demands greater care and different tools.

The debugging process is iterative. You form a hypothesis about the bug's cause, test it (add logging, set breakpoints, inspect state), gather evidence, refine your hypothesis, and repeat until the root cause is identified. Jumping to solutions without understanding the problem often makes things worse - the "fix" addresses symptoms rather than causes.

Understanding the full execution path - from user action through frontend, API layers, business logic, database queries, and external service calls - is critical for complex issues. Modern distributed systems complicate debugging because a single user action might traverse multiple services, each with independent logs, state, and failure modes. Correlation IDs tie these distributed operations together, allowing you to trace a request's full journey.

Core Principles

Reproduce Reliably: Without reliable reproduction, you're guessing
Isolate the Problem: Narrow the scope to specific components or code paths
Form Hypotheses: Develop testable theories about the cause
Test Systematically: Validate hypotheses with evidence, not assumptions
Fix the Root Cause: Address underlying problems, not just symptoms
Prevent Recurrence: Add tests, improve logging, document findings

Systematic Debugging Process

A systematic approach prevents wasted effort chasing red herrings and ensures you actually solve the problem rather than applying band-aid fixes.

1. Reproduce the Bug

Reliable reproduction is the foundation of debugging. If you can't reproduce the bug consistently, you can't verify whether your fix works. Reproduction demonstrates you understand the conditions that trigger the bug.

Gather detailed information about the failure:

What happened: Exact error message, unexpected behavior, user observation
Expected behavior: What should have happened instead
Steps to reproduce: Precise sequence of actions leading to the failure
Environment: Browser/OS version, data state, user permissions, time of day
Frequency: Always, intermittent (50% of attempts), rare (once per day)

For intermittent bugs, identify patterns. Does it only fail for certain users? After specific actions? Under load? Patterns reveal the underlying conditions.

Create a minimal reproduction - the simplest scenario that demonstrates the bug. If the bug occurs after a complex 10-step workflow, can you reproduce it in 3 steps? Minimal reproductions eliminate irrelevant variables and focus investigation on the essential failure conditions.

// Document reproduction steps
/*
Reproduction Steps:
1. Log in as user with email "[email protected]"
2. Navigate to /payments
3. Click "Create Payment" button
4. Enter amount: 100.00
5. Select currency: EUR
6. Click "Submit"

Expected: Payment created successfully, redirected to payment details
Actual: "Network Error" displayed, no payment created

Frequency: 100% reproduction
Environment: Chrome 119, macOS, staging environment
User account: Standard user (not admin)
*/

2. Isolate the Problem

Once reproduced, narrow down where the bug occurs. Modern applications have many layers: UI components, state management, API clients, backend controllers, service layers, data access, databases, external services. Bugs can hide in any layer or in the interactions between layers.

Binary search is effective for isolation. Test the midpoint of the execution path. Does the bug occur before or after this point? If after, test the midpoint of that half. This logarithmic approach quickly narrows the search space.

Use debugger breakpoints to inspect state at key points:

Before the operation
After the operation
At decision points (if statements, switch cases)
At integration boundaries (API calls, database queries)

Check intermediate values. Often bugs arise from incorrect intermediate calculations or state transformations. The final symptom (wrong total amount) might result from an earlier problem (incorrect currency conversion rate).

// Isolation example: Payment calculation bug
public PaymentResult processPayment(PaymentRequest request) {
    // Breakpoint 1: Inspect incoming request
    log.debug("Processing payment request: {}", request);

    // Breakpoint 2: Check validation result
    ValidationResult validation = validatePayment(request);
    if (!validation.isValid()) {
        return PaymentResult.failed(validation.getErrors());
    }

    // Breakpoint 3: Verify amount after conversion
    BigDecimal convertedAmount = convertCurrency(
        request.getAmount(),
        request.getSourceCurrency(),
        request.getTargetCurrency()
    );
    log.debug("Converted amount: {} {} -> {} {}",
        request.getAmount(), request.getSourceCurrency(),
        convertedAmount, request.getTargetCurrency());

    // Breakpoint 4: Check fee calculation
    BigDecimal fee = calculateFee(convertedAmount);
    BigDecimal totalAmount = convertedAmount.add(fee);

    // Breakpoint 5: Verify final state before persistence
    Payment payment = Payment.builder()
        .amount(totalAmount)
        .currency(request.getTargetCurrency())
        .build();

    return paymentRepository.save(payment);
}

For frontend bugs, use browser DevTools to isolate:

Elements panel: Inspect DOM, verify CSS applies correctly
Console: Check for JavaScript errors, warnings
Network panel: Verify API requests/responses, check status codes and payloads
React DevTools or Angular DevTools: Inspect component state and props

3. Form and Test Hypotheses

Based on the isolated problem area, form hypotheses about the cause. Good hypotheses are specific and testable.

Weak hypothesis: "Something is wrong with the payment calculation." Strong hypothesis: "The currency conversion rate is using yesterday's rate instead of today's rate, causing incorrect amounts."

Each hypothesis should be testable through observation or experimentation:

Add logging to show which rate is being used
Set breakpoint and inspect the exchangeRate variable
Query the database to verify the rate being retrieved
Check timestamp on the rate data

Test one hypothesis at a time. Changing multiple things simultaneously makes it impossible to know which change revealed the cause.

Common hypothesis patterns:

Incorrect input: "The frontend is sending null for the currency field."
Logic error: "The validation check uses > instead of >=, rejecting valid amounts."
State problem: "The component is reading stale state instead of updated values."
Race condition: "Two concurrent requests modify the same record, causing data corruption."
Environmental difference: "Production uses a different database schema version than staging."

4. Fix the Root Cause

Once you've identified the root cause, fix it at the source. Symptom fixes create technical debt and often introduce new bugs.

Symptom fix: Add a null check to prevent crashes when currency is null. Root cause fix: Ensure the frontend always sends a valid currency value, validate at the API boundary, and reject requests with missing required fields.

The root cause fix prevents the entire class of problems (missing required fields) rather than just this specific symptom (missing currency).

Before implementing the fix:

Understand why the bug wasn't caught earlier (missing test coverage, inadequate validation)
Consider edge cases your fix might not handle
Review if similar bugs might exist elsewhere (same pattern in other endpoints)

After implementing the fix:

Verify the fix resolves the original reproduction case
Test related functionality to ensure you didn't introduce regressions
Add tests that would have caught this bug before it reached production

5. Prevent Recurrence

The best debugging session ends with measures to prevent the same bug from reoccurring:

Add Test Coverage: Write a test that fails with the bug and passes with the fix. This test becomes a permanent guard against regression.

@Test
void shouldRejectPaymentWithMissingCurrency() {
    PaymentRequest request = PaymentRequest.builder()
        .amount(new BigDecimal("100.00"))
        .currency(null)  // Missing currency
        .build();

    assertThatThrownBy(() -> paymentService.processPayment(request))
        .isInstanceOf(ValidationException.class)
        .hasMessageContaining("Currency is required");
}

Improve Logging: If you struggled to diagnose the bug due to missing context in logs, add logging for key decisions and state changes. See Logging for Debugging below.

Enhance Validation: If the bug resulted from invalid input reaching business logic, add validation at the entry point. See our input validation guidelines.

Document Findings: For complex bugs, document your investigation process, root cause, and fix. This helps teammates encountering similar issues and builds institutional knowledge. Consider creating a post-mortem for production incidents - see our incident post-mortem guide.

Debugging Tools

IDE Debuggers

Modern IDE debuggers (IntelliJ IDEA, VS Code, Chrome DevTools) provide powerful capabilities beyond simple breakpoints. Mastering these features dramatically speeds up debugging.

Breakpoints

Breakpoints pause execution at a specific line, allowing you to inspect variables, evaluate expressions, and step through code line-by-line.

Standard Breakpoints: Click the gutter next to a line number. Execution pauses when that line is about to execute.

Conditional Breakpoints: Only pause when a condition is true. Right-click the breakpoint → Add Condition.

// Conditional breakpoint: Only pause when amount exceeds 1000
public void processPayment(BigDecimal amount) {
    // Breakpoint condition: amount.compareTo(new BigDecimal("1000")) > 0
    paymentGateway.charge(amount);
}

This is invaluable when debugging issues that only occur with specific data values. Instead of pausing on every iteration of a loop, pause only when you find the problematic item.

Exception Breakpoints: Pause whenever a specific exception is thrown, even if it's caught. IntelliJ: Run → View Breakpoints → Add Java Exception Breakpoint. This helps identify where exceptions originate before they're wrapped or masked by exception handlers.

Logpoint Breakpoints (non-pausing): Output a message to the console without pausing execution. Useful for gathering information across many iterations without stopping. IntelliJ: Right-click breakpoint → Select "Evaluate and log" instead of "Suspend".

Watch Expressions

Watch expressions evaluate custom expressions each time execution pauses, showing their current values. Add watches for complex expressions that aren't simple variables:

// Watch expressions
payment.getAmount().multiply(exchangeRate)
payment.getStatus().equals(PaymentStatus.PENDING) && retryCount > 3
LocalDateTime.now().minus(payment.getCreatedAt()).toMinutes()

Watches update automatically as you step through code, revealing how calculated values change.

Call Stack Inspection

The call stack shows the sequence of method calls leading to the current point. Click any frame in the stack to view that method's variables and source code. This reveals how you reached the current location and what values were passed from calling methods.

Call stacks are critical for understanding unexpected execution paths: "Why is this validation method being called from the scheduler instead of from the API controller?"

Step Execution

Step Over (F8): Execute the current line and pause at the next line in the same method
Step Into (F7): Enter the method being called on the current line
Step Out (Shift + F8): Complete execution of the current method and pause at the calling method
Run to Cursor (Alt + F9): Resume execution until reaching the line with the cursor

Use Step Over for method calls you trust (standard library, well-tested utilities). Use Step Into for methods you suspect contain bugs. This selective stepping focuses investigation on relevant code.

Remote Debugging

Remote debugging connects your IDE to a Java application running elsewhere - Docker containers, Kubernetes pods, remote servers. The application must be started with debug flags:

# Enable remote debugging on port 5005
java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005 \
  -jar payment-service.jar

Then configure your IDE to attach (see IDE Setup - Remote Debugging). Port forward if the application isn't directly accessible:

# Kubernetes
kubectl port-forward pod/payment-service-abc123 5005:5005

# Docker
docker run -p 8080:8080 -p 5005:5005 payment-service

# SSH tunnel
ssh -L 5005:localhost:5005 [email protected]

Remote debugging is essential for environment-specific bugs that don't reproduce locally. You can debug with real production data, configurations, and external service integrations while using your local IDE's full debugging capabilities.

Browser Developer Tools

Browser DevTools are indispensable for frontend debugging. All modern browsers (Chrome, Firefox, Safari, Edge) include similar tools accessible via F12.

Console

The JavaScript console shows errors, warnings, and console.log() output. More importantly, it's a REPL (Read-Eval-Print Loop) where you can execute arbitrary JavaScript in the context of the current page.

// Inspect current component state (React DevTools installed)
$r.state  // Current React component state
$r.props  // Current React component props

// Inspect selected DOM element
$0  // Currently selected element in Elements panel
$0.style.backgroundColor = 'yellow';  // Modify it

// Inspect network request
copy(fetch('/api/payments').then(r => r.json()))  // Copy response to clipboard

Use console.table() to display arrays of objects in a readable table format:

console.table(payments);  // Shows payments array as a table

Network Panel

The Network panel records all HTTP requests made by the page, showing request/response headers, payloads, timing, and status codes.

Common debugging tasks:

Verify request payload: Click request → Payload tab → Confirm request body matches expectations
Check response status: 200 OK vs 400 Bad Request vs 500 Server Error
Inspect response body: Verify API returned expected data structure
Analyze timing: Identify slow requests (Time column), check if requests are cached (Size shows "disk cache")
Replay requests: Right-click → Copy → Copy as cURL, then modify and replay

Filter by type (Fetch/XHR for API calls, JS for script files, Img for images) to focus on relevant requests.

For WebSocket debugging, the Network panel shows WebSocket connections, frame contents, and connection lifecycle events.

Sources Panel (Debugger)

The Sources panel provides JavaScript debugging similar to IDE debuggers: breakpoints, step execution, call stacks, and variable inspection.

Set breakpoints by clicking line numbers in source files. Use conditional breakpoints for complex conditions: right-click line number → Add conditional breakpoint.

The XHR/fetch breakpoints feature pauses when any XHR/fetch request is made to a URL matching a pattern. This helps debug unexpected API calls or identify which code triggers specific requests.

Event Listener Breakpoints pause when specific events fire (click, scroll, timer, etc.), useful for tracking down which event handler causes unexpected behavior.

React/Angular DevTools

Framework-specific DevTools extensions add panels for inspecting component hierarchies, state, and props.

React DevTools: View the React component tree, inspect current props/state, modify state to test different scenarios, and profile component rendering performance.

Angular DevTools: Inspect component tree, view component properties and dependencies, visualize change detection, and profile performance.

These tools reveal framework internals invisible through regular DOM inspection. They show virtual DOM structures, component lifecycle states, and data flow between components.

Logging for Debugging

Logging serves dual purposes: operational monitoring (production health) and debugging (development investigation). Effective logging provides the information needed to diagnose issues without overwhelming you with noise.

Structured Logging

Structured logging outputs machine-parseable formats (JSON) rather than unstructured text. This enables powerful querying, filtering, and aggregation in log analysis tools.

// Unstructured logging (avoid)
log.info("Payment processed for user alice with amount 100.00 USD");

// Structured logging (preferred)
log.info("Payment processed",
    kv("userId", "alice"),
    kv("amount", 100.00),
    kv("currency", "USD"),
    kv("paymentId", paymentId),
    kv("status", "COMPLETED"));

Structured logs are easily searchable: "Show all payments over $1000" or "Find all failed payments for user alice" become simple queries rather than complex regex patterns.

Popular structured logging libraries:

Java: SLF4J with Logback, Log4j2 with JSON layout, net.logstash.logback.argument.StructuredArguments
TypeScript/JavaScript: Winston, Pino, Bunyan
Spring Boot: Logback with logstash-logback-encoder

Correlation IDs

Correlation IDs (also called trace IDs or request IDs) uniquely identify a single logical operation across multiple services, log statements, and components. They tie distributed operations together, making it possible to trace a request's full path through the system.

Generate a correlation ID at the system boundary (API gateway, web server) and propagate it to all downstream services, database queries, and external API calls. Include the correlation ID in every log statement.

// Generate correlation ID for incoming requests
@Component
public class CorrelationIdFilter extends OncePerRequestFilter {
    @Override
    protected void doFilterInternal(HttpServletRequest request,
                                   HttpServletResponse response,
                                   FilterChain chain) {
        String correlationId = request.getHeader("X-Correlation-ID");
        if (correlationId == null) {
            correlationId = UUID.randomUUID().toString();
        }

        MDC.put("correlationId", correlationId);
        response.setHeader("X-Correlation-ID", correlationId);

        try {
            chain.doFilter(request, response);
        } finally {
            MDC.remove("correlationId");
        }
    }
}

// Logs automatically include correlation ID via MDC
log.info("Processing payment", kv("paymentId", paymentId));
// Output: {"timestamp":"2024-01-15T10:30:00Z","correlationId":"abc-123",...}

When investigating an issue, search logs by correlation ID to see every operation related to that request, even across multiple microservices. This reveals the complete story: which services were called, what data they received, how long each step took, and where the failure occurred.

For microservices, propagate correlation IDs via HTTP headers. Spring Cloud Sleuth and OpenTelemetry automate this propagation.

Log Levels

Use appropriate log levels to control verbosity:

ERROR: Application failures requiring immediate attention (database connection lost, external service timeout, unhandled exceptions)
WARN: Unexpected situations that don't prevent operation (deprecated API usage, fallback to default configuration, retry after transient failure)
INFO: Important business events (payment processed, user logged in, scheduled job started)
DEBUG: Detailed information useful during development (method entry/exit, intermediate calculations, decision points)
TRACE: Very detailed information (query parameters, loop iterations, framework internals)

Production typically runs at INFO level. Enable DEBUG or TRACE temporarily when investigating specific issues, then reduce back to INFO to avoid performance impact and log volume.

// Appropriate log levels
log.error("Payment processing failed", kv("paymentId", paymentId), exception);
log.warn("Exchange rate service returned stale data, using cached rate", kv("age", cacheAge));
log.info("Payment completed", kv("paymentId", paymentId), kv("amount", amount));
log.debug("Validating payment request", kv("userId", userId), kv("amount", amount));
log.trace("Retrieved exchange rate from cache", kv("rate", rate), kv("cacheKey", key));

What to Log

Log key decision points, state changes, and external interactions:

Business events: Payment initiated, payment completed, user registered, account verified Errors and exceptions: Always log exceptions with full stack traces and context External calls: API requests to third parties, database queries, message queue publications Security events: Authentication failures, authorization denials, suspicious activity Performance data: Operation duration for critical paths, cache hit/miss rates

@Service
public class PaymentService {
    public PaymentResult processPayment(PaymentRequest request) {
        log.info("Payment processing started",
            kv("userId", request.getUserId()),
            kv("amount", request.getAmount()),
            kv("currency", request.getCurrency()));

        try {
            // Log validation failures
            ValidationResult validation = validator.validate(request);
            if (!validation.isValid()) {
                log.warn("Payment validation failed",
                    kv("errors", validation.getErrors()));
                return PaymentResult.invalid(validation.getErrors());
            }

            // Log external service calls
            log.debug("Calling fraud detection service");
            FraudCheckResult fraudCheck = fraudService.checkPayment(request);

            if (fraudCheck.isSuspicious()) {
                log.warn("Payment flagged as suspicious",
                    kv("riskScore", fraudCheck.getRiskScore()),
                    kv("reasons", fraudCheck.getReasons()));
                return PaymentResult.rejected("Fraud detected");
            }

            // Log state changes
            Payment payment = paymentRepository.save(createPayment(request));
            log.info("Payment created", kv("paymentId", payment.getId()));

            // Log successful completion with timing
            long duration = System.currentTimeMillis() - startTime;
            log.info("Payment processing completed",
                kv("paymentId", payment.getId()),
                kv("duration", duration));

            return PaymentResult.success(payment);

        } catch (Exception e) {
            log.error("Payment processing failed unexpectedly",
                kv("userId", request.getUserId()), e);
            throw e;
        }
    }
}

Avoid logging sensitive data: passwords, credit card numbers, PINs, authentication tokens. Mask or omit sensitive fields in structured logs. See our security guidelines for data protection requirements.

Log Aggregation

In distributed systems, logs from multiple services must be aggregated into a centralized system for querying. Popular tools include:

ELK Stack (Elasticsearch, Logstash, Kibana): Index logs in Elasticsearch, query and visualize in Kibana
Grafana Loki: Lightweight alternative to ELK, integrates with Grafana
Splunk: Commercial log management and analysis
Datadog, New Relic: SaaS observability platforms with log aggregation

Centralized logging enables powerful queries: "Show all ERROR logs from payment-service in the last hour where userId=alice" or "Count WARNING logs grouped by service and error type."

Production Debugging

Production debugging is constrained debugging. You can't restart the application freely, attach a debugger, or modify code on a whim. You must work within operational boundaries while investigating live issues.

Feature Flags for Verbose Logging

Feature flags (also called feature toggles) enable/disable functionality without code changes. Use feature flags to control log verbosity in production.

@Service
public class PaymentService {
    private final FeatureFlagService featureFlags;

    public PaymentResult processPayment(PaymentRequest request) {
        boolean verboseLogging = featureFlags.isEnabled("verbose-payment-logging");

        if (verboseLogging) {
            log.debug("Payment processing details",
                kv("request", request),
                kv("user", userContext.getCurrentUser()),
                kv("headers", httpHeaders));
        }

        // Process payment
    }
}

When investigating a production issue, enable the feature flag for increased logging, gather diagnostic information, then disable the flag to reduce log volume. This avoids deploying code changes just to add temporary debugging logs.

Advanced implementations support user-specific feature flags: enable verbose logging only for specific users experiencing issues, leaving other users unaffected.

Popular feature flag libraries:

LaunchDarkly: Commercial SaaS feature flag service
Unleash: Open-source feature flag platform
Togglz: Java feature flag library
Spring Cloud Config: Configuration management with dynamic refresh

See our feature flags guide for implementation patterns.

Canary Analysis

Canary deployments route a small percentage of traffic to a new version while the majority uses the stable version. If the canary shows elevated errors or degraded performance, route all traffic back to the stable version before widespread impact.

This provides a controlled environment for production debugging. Deploy a version with enhanced logging or experimental fixes to the canary, observe behavior with real production traffic and data, then decide whether to promote or rollback.

Canary analysis compares metrics (error rates, latency, throughput) between canary and stable versions. Significant differences indicate the new version introduced a regression.

Implement canaries via:

Kubernetes: Multiple deployments with weighted load balancing
Service mesh (Istio, Linkerd): Traffic splitting rules
API Gateway: Route percentages to different backend versions
Feature flags: Randomly enable new code path for X% of requests

Safe Rollback

Production debugging sometimes reveals that a deployment introduced a critical bug. Safe rollback procedures restore the previous working version quickly.

Deployment strategies that enable fast rollback:

Blue-Green Deployment: Maintain two environments (blue: current, green: new). Deploy to green, test, then switch traffic. If issues arise, switch back to blue instantly.
Immutable Infrastructure: Deploy new versions as separate containers/instances rather than updating existing ones. Keep previous version running until confident in the new version.
Database Migrations: Use backward-compatible migrations that work with both old and new code. Roll out code changes first, migrate data after confirming code stability.

Rollback checklist:

Identify the last known good version
Verify rollback won't cause data inconsistencies
Execute rollback via deployment tool
Monitor metrics to confirm stability restored
Investigate root cause of the issue
Document incident in post-mortem

See our CI/CD pipeline documentation for rollback procedures.

Performance Debugging

Performance issues manifest as slow response times, high CPU usage, excessive memory consumption, or low throughput. Performance debugging identifies bottlenecks and inefficiencies.

Profilers

Profilers sample application execution to identify which methods consume the most CPU time or allocate the most memory. They reveal "hot spots" where optimization efforts should focus.

Java Profilers:

IntelliJ Profiler: Built-in profiler (Ultimate edition only), easy to use, visualizes flame graphs
VisualVM: Free standalone profiler, CPU/memory profiling, heap dump analysis
JProfiler: Commercial profiler with advanced features
Async-profiler: Low-overhead production profiler using JVM TI

JavaScript Profilers:

Chrome DevTools Profiler: Record performance, identify slow JavaScript functions
React Profiler: Identify components causing expensive re-renders
Lighthouse: Audit web page performance, accessibility, SEO

# Profile Java application with async-profiler
./profiler.sh -d 60 -f /tmp/flamegraph.html <pid>

Flame graphs visualize profiler output, showing the call stack with width representing time spent. The widest bars are the biggest opportunities for optimization.

Heap Dumps

Heap dumps capture the complete state of JVM memory at a point in time, showing all objects, their sizes, and references. Analyze heap dumps to find memory leaks (objects that should be garbage collected but aren't) and excessive memory usage.

# Generate heap dump manually
jmap -dump:format=b,file=/tmp/heap.hprof <pid>

# Or configure automatic heap dumps on OutOfMemoryError
java -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/tmp/heapdump.hprof \
     -jar application.jar

Analyze heap dumps with:

Eclipse Memory Analyzer (MAT): Free tool, identifies memory leak suspects automatically
VisualVM: View object histograms, inspect instance details
IntelliJ: Built-in heap dump viewer (Ultimate edition)

Look for:

Objects with unexpectedly high retained size (memory they prevent from being collected)
Collections growing indefinitely (caches without eviction, event listeners never removed)
Duplicate strings, objects that could be deduplicated or interned

Thread Dumps

Thread dumps show all thread states and stack traces at a moment in time. They reveal deadlocks, threads blocked waiting for resources, and CPU-intensive operations.

# Generate thread dump
jstack <pid> > threaddump.txt

# Or trigger from within application
kill -3 <pid>  # Sends SIGQUIT, prints thread dump to stdout

Thread states:

RUNNABLE: Thread is executing or ready to execute
BLOCKED: Waiting to acquire a monitor lock (synchronized block)
WAITING: Waiting indefinitely for another thread (Object.wait(), LockSupport.park())
TIMED_WAITING: Waiting for specified time (Thread.sleep(), wait(timeout))

Deadlocks appear in thread dumps with explanatory messages: "Found one Java-level deadlock". The dump shows which threads are waiting for which locks, revealing the circular dependency.

For threads consuming high CPU, take multiple thread dumps 5-10 seconds apart. Threads in RUNNABLE state in all dumps are likely CPU-bound operations. The stack traces show exactly which code is running.

Flame Graphs

Flame graphs visualize profiler data as hierarchical stack traces. The x-axis shows the sample population (not time - order is alphabetical). The y-axis shows stack depth. Width represents how frequently a method appears in samples (wider = more time spent).

Flame graphs quickly identify hot paths: look for wide bars at the top of the graph. These represent methods consuming significant CPU time.

Generate flame graphs from profiler output:

# Linux perf
perf record -F 99 -p <pid> -g -- sleep 60
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > flamegraph.svg

# Async-profiler (Java)
./profiler.sh -d 60 -f flamegraph.html <pid>

Interactive flame graphs allow clicking to zoom into specific subtrees, filtering to specific methods, and searching for patterns.

Network Debugging

Network issues manifest as failed API calls, timeouts, slow responses, or incorrect data. Network debugging tools intercept, inspect, and manipulate HTTP traffic.

Proxy Tools

HTTP proxy tools sit between client and server, intercepting all traffic for inspection and modification.

Charles Proxy: Commercial proxy with GUI, SSL certificate generation for HTTPS interception, request/response modification, throttling to simulate slow networks.

mitmproxy: Open-source proxy, CLI and web interface, Python scripting for custom request/response manipulation.

Fiddler: Free proxy for Windows, similar features to Charles.

Setup:

Configure proxy to listen on a port (e.g., 8888)
Configure application/browser to use proxy (HTTP proxy: localhost:8888)
For HTTPS, install proxy's SSL certificate as trusted

Use cases:

Inspect API request payloads to verify frontend sends correct data
Examine API responses to confirm backend returns expected structure
Modify requests to test error handling (change valid payment ID to invalid)
Modify responses to test frontend behavior with different data
Throttle bandwidth to test slow network conditions
Replay captured requests for testing

Browser DevTools Network Panel

See Browser Developer Tools - Network Panel above. Specifically useful for:

Waterfall view: Visualize request timing, identify blocking requests, optimize load order
Request blocking: Block specific requests to test offline behavior or missing dependencies
Throttling: Simulate slow 3G, offline, or custom network conditions
Cache simulation: Disable cache to test fresh loads, or enable to verify caching behavior

Command-Line Tools

cURL: Make HTTP requests from the command line, useful for testing APIs without a GUI.

# GET request
curl https://api.example.com/payments

# POST request with JSON body
curl -X POST https://api.example.com/payments \
  -H "Content-Type: application/json" \
  -d '{"amount": 100.00, "currency": "USD"}'

# Include headers in output
curl -i https://api.example.com/payments

# Follow redirects
curl -L https://api.example.com/payments

# Save response to file
curl -o response.json https://api.example.com/payments

HTTPie: User-friendly alternative to cURL with syntax highlighting and better defaults.

# GET request (httpie auto-formats JSON)
http GET https://api.example.com/payments

# POST with JSON (implicit Content-Type)
http POST https://api.example.com/payments amount=100.00 currency=USD

# Custom headers
http GET https://api.example.com/payments Authorization:"Bearer token123"

Postman: GUI application for API testing, collections for organizing requests, environment variables for different configurations, automated testing scripts.

Summary

Key Takeaways

Systematic Process: Reproduce → Isolate → Hypothesize → Fix → Prevent
IDE Debuggers: Master breakpoints, watches, stepping, remote debugging
Browser DevTools: Console, network panel, sources debugger, framework tools
Structured Logging: Use correlation IDs, appropriate log levels, structured formats
Production Constraints: Feature flags for verbose logging, canary analysis, safe rollback
Performance Tools: Profilers, heap dumps, thread dumps, flame graphs
Network Debugging: Proxy tools, browser network panel, cURL/HTTPie
Root Causes: Fix underlying problems, not symptoms
Prevention: Add tests, improve logging, document learnings
Distributed Systems: Correlation IDs tie requests across services

Next Steps: Review Observability for production monitoring and Performance Optimization for proactive performance improvements.

Overview​

Core Principles​

Systematic Debugging Process​

1. Reproduce the Bug​

2. Isolate the Problem​

3. Form and Test Hypotheses​

4. Fix the Root Cause​

5. Prevent Recurrence​

Debugging Tools​

IDE Debuggers​

Breakpoints​

Watch Expressions​

Call Stack Inspection​

Step Execution​

Remote Debugging​

Browser Developer Tools​

Console​

Network Panel​

Sources Panel (Debugger)​

React/Angular DevTools​

Logging for Debugging​

Structured Logging​

Correlation IDs​

Log Levels​

What to Log​

Log Aggregation​

Production Debugging​

Feature Flags for Verbose Logging​

Canary Analysis​

Safe Rollback​

Performance Debugging​

Profilers​

Heap Dumps​

Thread Dumps​

Flame Graphs​

Network Debugging​

Proxy Tools​

Browser DevTools Network Panel​

Command-Line Tools​

Further Reading​

Internal Documentation​

External Resources​

Summary​

Key Takeaways​