Debugging Strategies
Overview
Debugging is the systematic process of identifying, isolating, and fixing defects in software. While writing code is creative, debugging is analytical - requiring methodical investigation, hypothesis testing, and logical reasoning. Effective debugging separates experienced engineers from novices: experts debug faster not because they write fewer bugs, but because they've developed systematic approaches to finding and fixing them.
Debugging occurs across different contexts. During development, you debug with full control: restart the application, modify code, add logging freely. In production, constraints multiply: limited access, no restarts without approval, incomplete information. The core techniques remain similar, but production debugging demands greater care and different tools.
The debugging process is iterative. You form a hypothesis about the bug's cause, test it (add logging, set breakpoints, inspect state), gather evidence, refine your hypothesis, and repeat until the root cause is identified. Jumping to solutions without understanding the problem often makes things worse - the "fix" addresses symptoms rather than causes.
Understanding the full execution path - from user action through frontend, API layers, business logic, database queries, and external service calls - is critical for complex issues. Modern distributed systems complicate debugging because a single user action might traverse multiple services, each with independent logs, state, and failure modes. Correlation IDs tie these distributed operations together, allowing you to trace a request's full journey.
Core Principles
- Reproduce Reliably: Without reliable reproduction, you're guessing
- Isolate the Problem: Narrow the scope to specific components or code paths
- Form Hypotheses: Develop testable theories about the cause
- Test Systematically: Validate hypotheses with evidence, not assumptions
- Fix the Root Cause: Address underlying problems, not just symptoms
- Prevent Recurrence: Add tests, improve logging, document findings
Systematic Debugging Process
A systematic approach prevents wasted effort chasing red herrings and ensures you actually solve the problem rather than applying band-aid fixes.
1. Reproduce the Bug
Reliable reproduction is the foundation of debugging. If you can't reproduce the bug consistently, you can't verify whether your fix works. Reproduction demonstrates you understand the conditions that trigger the bug.
Gather detailed information about the failure:
- What happened: Exact error message, unexpected behavior, user observation
- Expected behavior: What should have happened instead
- Steps to reproduce: Precise sequence of actions leading to the failure
- Environment: Browser/OS version, data state, user permissions, time of day
- Frequency: Always, intermittent (50% of attempts), rare (once per day)
For intermittent bugs, identify patterns. Does it only fail for certain users? After specific actions? Under load? Patterns reveal the underlying conditions.
Create a minimal reproduction - the simplest scenario that demonstrates the bug. If the bug occurs after a complex 10-step workflow, can you reproduce it in 3 steps? Minimal reproductions eliminate irrelevant variables and focus investigation on the essential failure conditions.
// Document reproduction steps
/*
Reproduction Steps:
1. Log in as user with email "[email protected]"
2. Navigate to /payments
3. Click "Create Payment" button
4. Enter amount: 100.00
5. Select currency: EUR
6. Click "Submit"
Expected: Payment created successfully, redirected to payment details
Actual: "Network Error" displayed, no payment created
Frequency: 100% reproduction
Environment: Chrome 119, macOS, staging environment
User account: Standard user (not admin)
*/
2. Isolate the Problem
Once reproduced, narrow down where the bug occurs. Modern applications have many layers: UI components, state management, API clients, backend controllers, service layers, data access, databases, external services. Bugs can hide in any layer or in the interactions between layers.
Binary search is effective for isolation. Test the midpoint of the execution path. Does the bug occur before or after this point? If after, test the midpoint of that half. This logarithmic approach quickly narrows the search space.
Use debugger breakpoints to inspect state at key points:
- Before the operation
- After the operation
- At decision points (if statements, switch cases)
- At integration boundaries (API calls, database queries)
Check intermediate values. Often bugs arise from incorrect intermediate calculations or state transformations. The final symptom (wrong total amount) might result from an earlier problem (incorrect currency conversion rate).
// Isolation example: Payment calculation bug
public PaymentResult processPayment(PaymentRequest request) {
// Breakpoint 1: Inspect incoming request
log.debug("Processing payment request: {}", request);
// Breakpoint 2: Check validation result
ValidationResult validation = validatePayment(request);
if (!validation.isValid()) {
return PaymentResult.failed(validation.getErrors());
}
// Breakpoint 3: Verify amount after conversion
BigDecimal convertedAmount = convertCurrency(
request.getAmount(),
request.getSourceCurrency(),
request.getTargetCurrency()
);
log.debug("Converted amount: {} {} -> {} {}",
request.getAmount(), request.getSourceCurrency(),
convertedAmount, request.getTargetCurrency());
// Breakpoint 4: Check fee calculation
BigDecimal fee = calculateFee(convertedAmount);
BigDecimal totalAmount = convertedAmount.add(fee);
// Breakpoint 5: Verify final state before persistence
Payment payment = Payment.builder()
.amount(totalAmount)
.currency(request.getTargetCurrency())
.build();
return paymentRepository.save(payment);
}
For frontend bugs, use browser DevTools to isolate:
- Elements panel: Inspect DOM, verify CSS applies correctly
- Console: Check for JavaScript errors, warnings
- Network panel: Verify API requests/responses, check status codes and payloads
- React DevTools or Angular DevTools: Inspect component state and props
3. Form and Test Hypotheses
Based on the isolated problem area, form hypotheses about the cause. Good hypotheses are specific and testable.
Weak hypothesis: "Something is wrong with the payment calculation." Strong hypothesis: "The currency conversion rate is using yesterday's rate instead of today's rate, causing incorrect amounts."
Each hypothesis should be testable through observation or experimentation:
- Add logging to show which rate is being used
- Set breakpoint and inspect the
exchangeRatevariable - Query the database to verify the rate being retrieved
- Check timestamp on the rate data
Test one hypothesis at a time. Changing multiple things simultaneously makes it impossible to know which change revealed the cause.
Common hypothesis patterns:
- Incorrect input: "The frontend is sending
nullfor the currency field." - Logic error: "The validation check uses
>instead of>=, rejecting valid amounts." - State problem: "The component is reading stale state instead of updated values."
- Race condition: "Two concurrent requests modify the same record, causing data corruption."
- Environmental difference: "Production uses a different database schema version than staging."
4. Fix the Root Cause
Once you've identified the root cause, fix it at the source. Symptom fixes create technical debt and often introduce new bugs.
Symptom fix: Add a null check to prevent crashes when currency is null. Root cause fix: Ensure the frontend always sends a valid currency value, validate at the API boundary, and reject requests with missing required fields.
The root cause fix prevents the entire class of problems (missing required fields) rather than just this specific symptom (missing currency).
Before implementing the fix:
- Understand why the bug wasn't caught earlier (missing test coverage, inadequate validation)
- Consider edge cases your fix might not handle
- Review if similar bugs might exist elsewhere (same pattern in other endpoints)
After implementing the fix:
- Verify the fix resolves the original reproduction case
- Test related functionality to ensure you didn't introduce regressions
- Add tests that would have caught this bug before it reached production
5. Prevent Recurrence
The best debugging session ends with measures to prevent the same bug from reoccurring:
Add Test Coverage: Write a test that fails with the bug and passes with the fix. This test becomes a permanent guard against regression.
@Test
void shouldRejectPaymentWithMissingCurrency() {
PaymentRequest request = PaymentRequest.builder()
.amount(new BigDecimal("100.00"))
.currency(null) // Missing currency
.build();
assertThatThrownBy(() -> paymentService.processPayment(request))
.isInstanceOf(ValidationException.class)
.hasMessageContaining("Currency is required");
}
Improve Logging: If you struggled to diagnose the bug due to missing context in logs, add logging for key decisions and state changes. See Logging for Debugging below.
Enhance Validation: If the bug resulted from invalid input reaching business logic, add validation at the entry point. See our input validation guidelines.
Document Findings: For complex bugs, document your investigation process, root cause, and fix. This helps teammates encountering similar issues and builds institutional knowledge. Consider creating a post-mortem for production incidents - see our incident post-mortem guide.
Debugging Tools
IDE Debuggers
Modern IDE debuggers (IntelliJ IDEA, VS Code, Chrome DevTools) provide powerful capabilities beyond simple breakpoints. Mastering these features dramatically speeds up debugging.
Breakpoints
Breakpoints pause execution at a specific line, allowing you to inspect variables, evaluate expressions, and step through code line-by-line.
Standard Breakpoints: Click the gutter next to a line number. Execution pauses when that line is about to execute.
Conditional Breakpoints: Only pause when a condition is true. Right-click the breakpoint → Add Condition.
// Conditional breakpoint: Only pause when amount exceeds 1000
public void processPayment(BigDecimal amount) {
// Breakpoint condition: amount.compareTo(new BigDecimal("1000")) > 0
paymentGateway.charge(amount);
}
This is invaluable when debugging issues that only occur with specific data values. Instead of pausing on every iteration of a loop, pause only when you find the problematic item.
Exception Breakpoints: Pause whenever a specific exception is thrown, even if it's caught. IntelliJ: Run → View Breakpoints → Add Java Exception Breakpoint. This helps identify where exceptions originate before they're wrapped or masked by exception handlers.
Logpoint Breakpoints (non-pausing): Output a message to the console without pausing execution. Useful for gathering information across many iterations without stopping. IntelliJ: Right-click breakpoint → Select "Evaluate and log" instead of "Suspend".
Watch Expressions
Watch expressions evaluate custom expressions each time execution pauses, showing their current values. Add watches for complex expressions that aren't simple variables:
// Watch expressions
payment.getAmount().multiply(exchangeRate)
payment.getStatus().equals(PaymentStatus.PENDING) && retryCount > 3
LocalDateTime.now().minus(payment.getCreatedAt()).toMinutes()
Watches update automatically as you step through code, revealing how calculated values change.
Call Stack Inspection
The call stack shows the sequence of method calls leading to the current point. Click any frame in the stack to view that method's variables and source code. This reveals how you reached the current location and what values were passed from calling methods.
Call stacks are critical for understanding unexpected execution paths: "Why is this validation method being called from the scheduler instead of from the API controller?"
Step Execution
- Step Over (F8): Execute the current line and pause at the next line in the same method
- Step Into (F7): Enter the method being called on the current line
- Step Out (Shift + F8): Complete execution of the current method and pause at the calling method
- Run to Cursor (Alt + F9): Resume execution until reaching the line with the cursor
Use Step Over for method calls you trust (standard library, well-tested utilities). Use Step Into for methods you suspect contain bugs. This selective stepping focuses investigation on relevant code.
Remote Debugging
Remote debugging connects your IDE to a Java application running elsewhere - Docker containers, Kubernetes pods, remote servers. The application must be started with debug flags:
# Enable remote debugging on port 5005
java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005 \
-jar payment-service.jar
Then configure your IDE to attach (see IDE Setup - Remote Debugging). Port forward if the application isn't directly accessible:
# Kubernetes
kubectl port-forward pod/payment-service-abc123 5005:5005
# Docker
docker run -p 8080:8080 -p 5005:5005 payment-service
# SSH tunnel
ssh -L 5005:localhost:5005 [email protected]
Remote debugging is essential for environment-specific bugs that don't reproduce locally. You can debug with real production data, configurations, and external service integrations while using your local IDE's full debugging capabilities.
Browser Developer Tools
Browser DevTools are indispensable for frontend debugging. All modern browsers (Chrome, Firefox, Safari, Edge) include similar tools accessible via F12.
Console
The JavaScript console shows errors, warnings, and console.log() output. More importantly, it's a REPL (Read-Eval-Print Loop) where you can execute arbitrary JavaScript in the context of the current page.
// Inspect current component state (React DevTools installed)
$r.state // Current React component state
$r.props // Current React component props
// Inspect selected DOM element
$0 // Currently selected element in Elements panel
$0.style.backgroundColor = 'yellow'; // Modify it
// Inspect network request
copy(fetch('/api/payments').then(r => r.json())) // Copy response to clipboard
Use console.table() to display arrays of objects in a readable table format:
console.table(payments); // Shows payments array as a table
Network Panel
The Network panel records all HTTP requests made by the page, showing request/response headers, payloads, timing, and status codes.
Common debugging tasks:
- Verify request payload: Click request → Payload tab → Confirm request body matches expectations
- Check response status: 200 OK vs 400 Bad Request vs 500 Server Error
- Inspect response body: Verify API returned expected data structure
- Analyze timing: Identify slow requests (Time column), check if requests are cached (Size shows "disk cache")
- Replay requests: Right-click → Copy → Copy as cURL, then modify and replay
Filter by type (Fetch/XHR for API calls, JS for script files, Img for images) to focus on relevant requests.
For WebSocket debugging, the Network panel shows WebSocket connections, frame contents, and connection lifecycle events.
Sources Panel (Debugger)
The Sources panel provides JavaScript debugging similar to IDE debuggers: breakpoints, step execution, call stacks, and variable inspection.
Set breakpoints by clicking line numbers in source files. Use conditional breakpoints for complex conditions: right-click line number → Add conditional breakpoint.
The XHR/fetch breakpoints feature pauses when any XHR/fetch request is made to a URL matching a pattern. This helps debug unexpected API calls or identify which code triggers specific requests.
Event Listener Breakpoints pause when specific events fire (click, scroll, timer, etc.), useful for tracking down which event handler causes unexpected behavior.
React/Angular DevTools
Framework-specific DevTools extensions add panels for inspecting component hierarchies, state, and props.
React DevTools: View the React component tree, inspect current props/state, modify state to test different scenarios, and profile component rendering performance.
Angular DevTools: Inspect component tree, view component properties and dependencies, visualize change detection, and profile performance.
These tools reveal framework internals invisible through regular DOM inspection. They show virtual DOM structures, component lifecycle states, and data flow between components.
Logging for Debugging
Logging serves dual purposes: operational monitoring (production health) and debugging (development investigation). Effective logging provides the information needed to diagnose issues without overwhelming you with noise.
Structured Logging
Structured logging outputs machine-parseable formats (JSON) rather than unstructured text. This enables powerful querying, filtering, and aggregation in log analysis tools.
// Unstructured logging (avoid)
log.info("Payment processed for user alice with amount 100.00 USD");
// Structured logging (preferred)
log.info("Payment processed",
kv("userId", "alice"),
kv("amount", 100.00),
kv("currency", "USD"),
kv("paymentId", paymentId),
kv("status", "COMPLETED"));
Structured logs are easily searchable: "Show all payments over $1000" or "Find all failed payments for user alice" become simple queries rather than complex regex patterns.
Popular structured logging libraries:
- Java: SLF4J with Logback, Log4j2 with JSON layout,
net.logstash.logback.argument.StructuredArguments - TypeScript/JavaScript: Winston, Pino, Bunyan
- Spring Boot: Logback with logstash-logback-encoder
Correlation IDs
Correlation IDs (also called trace IDs or request IDs) uniquely identify a single logical operation across multiple services, log statements, and components. They tie distributed operations together, making it possible to trace a request's full path through the system.
Generate a correlation ID at the system boundary (API gateway, web server) and propagate it to all downstream services, database queries, and external API calls. Include the correlation ID in every log statement.
// Generate correlation ID for incoming requests
@Component
public class CorrelationIdFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain chain) {
String correlationId = request.getHeader("X-Correlation-ID");
if (correlationId == null) {
correlationId = UUID.randomUUID().toString();
}
MDC.put("correlationId", correlationId);
response.setHeader("X-Correlation-ID", correlationId);
try {
chain.doFilter(request, response);
} finally {
MDC.remove("correlationId");
}
}
}
// Logs automatically include correlation ID via MDC
log.info("Processing payment", kv("paymentId", paymentId));
// Output: {"timestamp":"2024-01-15T10:30:00Z","correlationId":"abc-123",...}
When investigating an issue, search logs by correlation ID to see every operation related to that request, even across multiple microservices. This reveals the complete story: which services were called, what data they received, how long each step took, and where the failure occurred.
For microservices, propagate correlation IDs via HTTP headers. Spring Cloud Sleuth and OpenTelemetry automate this propagation.
Log Levels
Use appropriate log levels to control verbosity:
- ERROR: Application failures requiring immediate attention (database connection lost, external service timeout, unhandled exceptions)
- WARN: Unexpected situations that don't prevent operation (deprecated API usage, fallback to default configuration, retry after transient failure)
- INFO: Important business events (payment processed, user logged in, scheduled job started)
- DEBUG: Detailed information useful during development (method entry/exit, intermediate calculations, decision points)
- TRACE: Very detailed information (query parameters, loop iterations, framework internals)
Production typically runs at INFO level. Enable DEBUG or TRACE temporarily when investigating specific issues, then reduce back to INFO to avoid performance impact and log volume.
// Appropriate log levels
log.error("Payment processing failed", kv("paymentId", paymentId), exception);
log.warn("Exchange rate service returned stale data, using cached rate", kv("age", cacheAge));
log.info("Payment completed", kv("paymentId", paymentId), kv("amount", amount));
log.debug("Validating payment request", kv("userId", userId), kv("amount", amount));
log.trace("Retrieved exchange rate from cache", kv("rate", rate), kv("cacheKey", key));
What to Log
Log key decision points, state changes, and external interactions:
Business events: Payment initiated, payment completed, user registered, account verified Errors and exceptions: Always log exceptions with full stack traces and context External calls: API requests to third parties, database queries, message queue publications Security events: Authentication failures, authorization denials, suspicious activity Performance data: Operation duration for critical paths, cache hit/miss rates
@Service
public class PaymentService {
public PaymentResult processPayment(PaymentRequest request) {
log.info("Payment processing started",
kv("userId", request.getUserId()),
kv("amount", request.getAmount()),
kv("currency", request.getCurrency()));
try {
// Log validation failures
ValidationResult validation = validator.validate(request);
if (!validation.isValid()) {
log.warn("Payment validation failed",
kv("errors", validation.getErrors()));
return PaymentResult.invalid(validation.getErrors());
}
// Log external service calls
log.debug("Calling fraud detection service");
FraudCheckResult fraudCheck = fraudService.checkPayment(request);
if (fraudCheck.isSuspicious()) {
log.warn("Payment flagged as suspicious",
kv("riskScore", fraudCheck.getRiskScore()),
kv("reasons", fraudCheck.getReasons()));
return PaymentResult.rejected("Fraud detected");
}
// Log state changes
Payment payment = paymentRepository.save(createPayment(request));
log.info("Payment created", kv("paymentId", payment.getId()));
// Log successful completion with timing
long duration = System.currentTimeMillis() - startTime;
log.info("Payment processing completed",
kv("paymentId", payment.getId()),
kv("duration", duration));
return PaymentResult.success(payment);
} catch (Exception e) {
log.error("Payment processing failed unexpectedly",
kv("userId", request.getUserId()), e);
throw e;
}
}
}
Avoid logging sensitive data: passwords, credit card numbers, PINs, authentication tokens. Mask or omit sensitive fields in structured logs. See our security guidelines for data protection requirements.
Log Aggregation
In distributed systems, logs from multiple services must be aggregated into a centralized system for querying. Popular tools include:
- ELK Stack (Elasticsearch, Logstash, Kibana): Index logs in Elasticsearch, query and visualize in Kibana
- Grafana Loki: Lightweight alternative to ELK, integrates with Grafana
- Splunk: Commercial log management and analysis
- Datadog, New Relic: SaaS observability platforms with log aggregation
Centralized logging enables powerful queries: "Show all ERROR logs from payment-service in the last hour where userId=alice" or "Count WARNING logs grouped by service and error type."
Production Debugging
Production debugging is constrained debugging. You can't restart the application freely, attach a debugger, or modify code on a whim. You must work within operational boundaries while investigating live issues.
Feature Flags for Verbose Logging
Feature flags (also called feature toggles) enable/disable functionality without code changes. Use feature flags to control log verbosity in production.
@Service
public class PaymentService {
private final FeatureFlagService featureFlags;
public PaymentResult processPayment(PaymentRequest request) {
boolean verboseLogging = featureFlags.isEnabled("verbose-payment-logging");
if (verboseLogging) {
log.debug("Payment processing details",
kv("request", request),
kv("user", userContext.getCurrentUser()),
kv("headers", httpHeaders));
}
// Process payment
}
}
When investigating a production issue, enable the feature flag for increased logging, gather diagnostic information, then disable the flag to reduce log volume. This avoids deploying code changes just to add temporary debugging logs.
Advanced implementations support user-specific feature flags: enable verbose logging only for specific users experiencing issues, leaving other users unaffected.
Popular feature flag libraries:
- LaunchDarkly: Commercial SaaS feature flag service
- Unleash: Open-source feature flag platform
- Togglz: Java feature flag library
- Spring Cloud Config: Configuration management with dynamic refresh
See our feature flags guide for implementation patterns.
Canary Analysis
Canary deployments route a small percentage of traffic to a new version while the majority uses the stable version. If the canary shows elevated errors or degraded performance, route all traffic back to the stable version before widespread impact.
This provides a controlled environment for production debugging. Deploy a version with enhanced logging or experimental fixes to the canary, observe behavior with real production traffic and data, then decide whether to promote or rollback.
Canary analysis compares metrics (error rates, latency, throughput) between canary and stable versions. Significant differences indicate the new version introduced a regression.
Implement canaries via:
- Kubernetes: Multiple deployments with weighted load balancing
- Service mesh (Istio, Linkerd): Traffic splitting rules
- API Gateway: Route percentages to different backend versions
- Feature flags: Randomly enable new code path for X% of requests
Safe Rollback
Production debugging sometimes reveals that a deployment introduced a critical bug. Safe rollback procedures restore the previous working version quickly.
Deployment strategies that enable fast rollback:
- Blue-Green Deployment: Maintain two environments (blue: current, green: new). Deploy to green, test, then switch traffic. If issues arise, switch back to blue instantly.
- Immutable Infrastructure: Deploy new versions as separate containers/instances rather than updating existing ones. Keep previous version running until confident in the new version.
- Database Migrations: Use backward-compatible migrations that work with both old and new code. Roll out code changes first, migrate data after confirming code stability.
Rollback checklist:
- Identify the last known good version
- Verify rollback won't cause data inconsistencies
- Execute rollback via deployment tool
- Monitor metrics to confirm stability restored
- Investigate root cause of the issue
- Document incident in post-mortem
See our CI/CD pipeline documentation for rollback procedures.
Performance Debugging
Performance issues manifest as slow response times, high CPU usage, excessive memory consumption, or low throughput. Performance debugging identifies bottlenecks and inefficiencies.
Profilers
Profilers sample application execution to identify which methods consume the most CPU time or allocate the most memory. They reveal "hot spots" where optimization efforts should focus.
Java Profilers:
- IntelliJ Profiler: Built-in profiler (Ultimate edition only), easy to use, visualizes flame graphs
- VisualVM: Free standalone profiler, CPU/memory profiling, heap dump analysis
- JProfiler: Commercial profiler with advanced features
- Async-profiler: Low-overhead production profiler using JVM TI
JavaScript Profilers:
- Chrome DevTools Profiler: Record performance, identify slow JavaScript functions
- React Profiler: Identify components causing expensive re-renders
- Lighthouse: Audit web page performance, accessibility, SEO
# Profile Java application with async-profiler
./profiler.sh -d 60 -f /tmp/flamegraph.html <pid>
Flame graphs visualize profiler output, showing the call stack with width representing time spent. The widest bars are the biggest opportunities for optimization.
Heap Dumps
Heap dumps capture the complete state of JVM memory at a point in time, showing all objects, their sizes, and references. Analyze heap dumps to find memory leaks (objects that should be garbage collected but aren't) and excessive memory usage.
# Generate heap dump manually
jmap -dump:format=b,file=/tmp/heap.hprof <pid>
# Or configure automatic heap dumps on OutOfMemoryError
java -XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/tmp/heapdump.hprof \
-jar application.jar
Analyze heap dumps with:
- Eclipse Memory Analyzer (MAT): Free tool, identifies memory leak suspects automatically
- VisualVM: View object histograms, inspect instance details
- IntelliJ: Built-in heap dump viewer (Ultimate edition)
Look for:
- Objects with unexpectedly high retained size (memory they prevent from being collected)
- Collections growing indefinitely (caches without eviction, event listeners never removed)
- Duplicate strings, objects that could be deduplicated or interned
Thread Dumps
Thread dumps show all thread states and stack traces at a moment in time. They reveal deadlocks, threads blocked waiting for resources, and CPU-intensive operations.
# Generate thread dump
jstack <pid> > threaddump.txt
# Or trigger from within application
kill -3 <pid> # Sends SIGQUIT, prints thread dump to stdout
Thread states:
- RUNNABLE: Thread is executing or ready to execute
- BLOCKED: Waiting to acquire a monitor lock (synchronized block)
- WAITING: Waiting indefinitely for another thread (Object.wait(), LockSupport.park())
- TIMED_WAITING: Waiting for specified time (Thread.sleep(), wait(timeout))
Deadlocks appear in thread dumps with explanatory messages: "Found one Java-level deadlock". The dump shows which threads are waiting for which locks, revealing the circular dependency.
For threads consuming high CPU, take multiple thread dumps 5-10 seconds apart. Threads in RUNNABLE state in all dumps are likely CPU-bound operations. The stack traces show exactly which code is running.
Flame Graphs
Flame graphs visualize profiler data as hierarchical stack traces. The x-axis shows the sample population (not time - order is alphabetical). The y-axis shows stack depth. Width represents how frequently a method appears in samples (wider = more time spent).
Flame graphs quickly identify hot paths: look for wide bars at the top of the graph. These represent methods consuming significant CPU time.
Generate flame graphs from profiler output:
# Linux perf
perf record -F 99 -p <pid> -g -- sleep 60
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > flamegraph.svg
# Async-profiler (Java)
./profiler.sh -d 60 -f flamegraph.html <pid>
Interactive flame graphs allow clicking to zoom into specific subtrees, filtering to specific methods, and searching for patterns.
Network Debugging
Network issues manifest as failed API calls, timeouts, slow responses, or incorrect data. Network debugging tools intercept, inspect, and manipulate HTTP traffic.
Proxy Tools
HTTP proxy tools sit between client and server, intercepting all traffic for inspection and modification.
Charles Proxy: Commercial proxy with GUI, SSL certificate generation for HTTPS interception, request/response modification, throttling to simulate slow networks.
mitmproxy: Open-source proxy, CLI and web interface, Python scripting for custom request/response manipulation.
Fiddler: Free proxy for Windows, similar features to Charles.
Setup:
- Configure proxy to listen on a port (e.g., 8888)
- Configure application/browser to use proxy (HTTP proxy: localhost:8888)
- For HTTPS, install proxy's SSL certificate as trusted
Use cases:
- Inspect API request payloads to verify frontend sends correct data
- Examine API responses to confirm backend returns expected structure
- Modify requests to test error handling (change valid payment ID to invalid)
- Modify responses to test frontend behavior with different data
- Throttle bandwidth to test slow network conditions
- Replay captured requests for testing
Browser DevTools Network Panel
See Browser Developer Tools - Network Panel above. Specifically useful for:
- Waterfall view: Visualize request timing, identify blocking requests, optimize load order
- Request blocking: Block specific requests to test offline behavior or missing dependencies
- Throttling: Simulate slow 3G, offline, or custom network conditions
- Cache simulation: Disable cache to test fresh loads, or enable to verify caching behavior
Command-Line Tools
cURL: Make HTTP requests from the command line, useful for testing APIs without a GUI.
# GET request
curl https://api.example.com/payments
# POST request with JSON body
curl -X POST https://api.example.com/payments \
-H "Content-Type: application/json" \
-d '{"amount": 100.00, "currency": "USD"}'
# Include headers in output
curl -i https://api.example.com/payments
# Follow redirects
curl -L https://api.example.com/payments
# Save response to file
curl -o response.json https://api.example.com/payments
HTTPie: User-friendly alternative to cURL with syntax highlighting and better defaults.
# GET request (httpie auto-formats JSON)
http GET https://api.example.com/payments
# POST with JSON (implicit Content-Type)
http POST https://api.example.com/payments amount=100.00 currency=USD
# Custom headers
http GET https://api.example.com/payments Authorization:"Bearer token123"
Postman: GUI application for API testing, collections for organizing requests, environment variables for different configurations, automated testing scripts.
Further Reading
Internal Documentation
- Logging - Structured logging, log levels, best practices
- Metrics - Application metrics for monitoring
- Tracing - Distributed tracing with OpenTelemetry
- IDE Setup - Debugger configuration, remote debugging
- Performance Testing - Load testing, benchmarking
- Feature Flags - Feature flag implementation patterns
External Resources
- Java Debugging with IntelliJ IDEA
- Chrome DevTools Documentation
- The Art of Debugging
- Brendan Gregg's Performance Analysis
- VisualVM Documentation
Summary
Key Takeaways
- Systematic Process: Reproduce → Isolate → Hypothesize → Fix → Prevent
- IDE Debuggers: Master breakpoints, watches, stepping, remote debugging
- Browser DevTools: Console, network panel, sources debugger, framework tools
- Structured Logging: Use correlation IDs, appropriate log levels, structured formats
- Production Constraints: Feature flags for verbose logging, canary analysis, safe rollback
- Performance Tools: Profilers, heap dumps, thread dumps, flame graphs
- Network Debugging: Proxy tools, browser network panel, cURL/HTTPie
- Root Causes: Fix underlying problems, not symptoms
- Prevention: Add tests, improve logging, document learnings
- Distributed Systems: Correlation IDs tie requests across services
Next Steps: Review Observability for production monitoring and Performance Optimization for proactive performance improvements.