Analytics and Business Intelligence
Overview
Analytics systems capture, process, and analyze user behavior data to inform product decisions, measure business outcomes, and improve user experiences. Modern analytics implementations must balance comprehensive data collection with user privacy, provide both real-time insights and historical analysis, and integrate with data warehouses for advanced business intelligence.
This guide covers event tracking strategies, analytics platform selection, implementing A/B testing frameworks, privacy considerations, and integrating analytics data with data warehouses for business intelligence reporting.
Core Principles
- Define events with intention - Every tracked event should serve a specific decision-making purpose; avoid tracking "just in case"
- Prioritize user privacy - Collect only necessary data, implement consent mechanisms, anonymize sensitive information, and provide opt-out capabilities
- Maintain event consistency - Establish naming conventions and schemas upfront to prevent fragmented, unmaintainable analytics implementations
- Separate collection from analysis - Use event streaming and data warehouses to enable multiple analytics use cases from a single data collection layer
- Test your tracking - Implement automated tests for critical events to catch tracking breakages before they reach production
Analytics Implementation Strategies
Choosing an Analytics Platform
Different platforms serve different use cases and organizational needs:
Product Analytics Platforms focus on user behavior and product metrics:
- Mixpanel - User-centric analytics, retention cohorts, funnel analysis, event-based pricing
- Amplitude - Product analytics with behavioral cohorts, path analysis, session replay, predictive analytics
- Heap - Autocapture all events then define retroactively, visual labeling, no code changes for new events
Web Analytics Platforms focus on traffic and acquisition:
- Google Analytics 4 - Free (with limits), event-based model, integration with Google Ads, machine learning insights
- Plausible/Fathom - Privacy-focused, lightweight, no cookies, GDPR-compliant by default
Customer Data Platforms unify data from multiple sources:
- Segment - Customer data infrastructure, routes events to multiple destinations, source of truth for user data
- RudderStack - Open-source alternative to Segment, warehouse-first architecture, data governance
Key selection criteria:
- Pricing model - Event volume, user count, or feature-based pricing
- Data ownership - Can you export raw event data? Is there a data warehouse integration?
- Privacy compliance - GDPR/CCPA features, data retention controls, consent management
- Real-time vs. batch - Does your use case require real-time dashboards or is daily batch processing sufficient?
- Technical integration - SDKs for your tech stack, API availability, custom event properties
For applications requiring both product analytics and extensive custom analysis, use a Customer Data Platform (Segment, RudderStack) to send events to multiple destinations simultaneously:
This architecture decouples event collection from consumption, allowing you to add new analytics destinations without changing application code.
Event Tracking Strategy
Event taxonomy is the foundation of maintainable analytics. Poor taxonomy leads to duplicate events, inconsistent naming, and fragmented data that's difficult to query.
Establish naming conventions before implementing tracking:
// Event naming convention: Object + Action
// Examples:
// - user_signed_up
// - product_viewed
// - payment_completed
// - video_played
// - form_submitted
interface BaseEvent {
eventName: string; // e.g., "product_viewed"
timestamp: string; // ISO 8601 timestamp
userId?: string; // Identified user ID
anonymousId: string; // Anonymous session ID
sessionId: string; // Session identifier
properties: EventProperties;
}
interface EventProperties {
// Common properties for all events
platform: 'web' | 'ios' | 'android';
appVersion: string;
locale: string;
// Event-specific properties
[key: string]: any;
}
// Example: Product view event
interface ProductViewedEvent extends BaseEvent {
eventName: 'product_viewed';
properties: {
productId: string;
productName: string;
category: string;
price: number;
currency: string;
platform: 'web' | 'ios' | 'android';
appVersion: string;
locale: string;
};
}
Benefits of strong typing: TypeScript interfaces prevent typos in event names and property keys, provide autocomplete for event tracking calls, and serve as documentation for the analytics schema.
Implementing Event Tracking
Centralized tracking layer abstracts the analytics provider and enforces consistent event structure:
// analytics.ts - Centralized analytics service
import * as Sentry from '@sentry/react';
import Analytics from 'analytics';
import segmentPlugin from '@analytics/segment';
// Initialize analytics with plugins
const analytics = Analytics({
app: 'my-app',
plugins: [
segmentPlugin({
writeKey: process.env.REACT_APP_SEGMENT_WRITE_KEY
})
]
});
// Type-safe event tracking
export function trackEvent<T extends BaseEvent>(event: T): void {
try {
// Validate event structure
if (!event.eventName) {
throw new Error('Event name is required');
}
// Add automatic properties
const enrichedEvent = {
...event,
properties: {
...event.properties,
platform: 'web',
appVersion: process.env.REACT_APP_VERSION,
locale: navigator.language,
timestamp: new Date().toISOString()
}
};
// Send to analytics provider
analytics.track(enrichedEvent.eventName, enrichedEvent.properties);
// Log in development
if (process.env.NODE_ENV === 'development') {
console.log('[Analytics]', enrichedEvent.eventName, enrichedEvent.properties);
}
} catch (error) {
// Don't let analytics errors crash the app
console.error('Analytics tracking failed:', error);
Sentry.captureException(error);
}
}
// Identify user
export function identifyUser(userId: string, traits?: Record<string, any>): void {
analytics.identify(userId, traits);
}
// Track page views
export function trackPageView(pageName: string, properties?: Record<string, any>): void {
analytics.page(pageName, properties);
}
Usage in components:
// ProductPage.tsx
import { useEffect } from 'react';
import { trackEvent } from './analytics';
export const ProductPage: React.FC<{ product: Product }> = ({ product }) => {
useEffect(() => {
// Track product view
trackEvent({
eventName: 'product_viewed',
properties: {
productId: product.id,
productName: product.name,
category: product.category,
price: product.price,
currency: 'USD',
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
}, [product.id]);
const handleAddToCart = () => {
// Track add to cart action
trackEvent({
eventName: 'product_added_to_cart',
properties: {
productId: product.id,
productName: product.name,
price: product.price,
quantity: 1,
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
addToCart(product);
};
return (
<div>
<h1>{product.name}</h1>
<button onClick={handleAddToCart}>Add to Cart</button>
</div>
);
};
Server-Side Event Tracking
Not all events originate from client applications. Track backend events for API usage, background jobs, and system-initiated actions:
// Spring Boot analytics service
@Service
public class AnalyticsService {
private final SegmentClient segmentClient;
private final ObjectMapper objectMapper;
public AnalyticsService(
@Value("${segment.writeKey}") String writeKey,
ObjectMapper objectMapper) {
this.segmentClient = Analytics.builder(writeKey).build();
this.objectMapper = objectMapper;
}
public void trackEvent(String userId, String eventName, Map<String, Object> properties) {
try {
// Add automatic properties
Map<String, Object> enrichedProperties = new HashMap<>(properties);
enrichedProperties.put("platform", "backend");
enrichedProperties.put("appVersion", getApplicationVersion());
enrichedProperties.put("timestamp", Instant.now().toString());
// Send to Segment
TrackMessage message = TrackMessage.builder(eventName)
.userId(userId)
.properties(enrichedProperties)
.build();
segmentClient.enqueue(message);
log.debug("Tracked event: {} for user: {}", eventName, userId);
} catch (Exception e) {
// Don't let analytics failures affect business logic
log.error("Failed to track event: {}", eventName, e);
}
}
public void identifyUser(String userId, Map<String, Object> traits) {
IdentifyMessage message = IdentifyMessage.builder()
.userId(userId)
.traits(traits)
.build();
segmentClient.enqueue(message);
}
@PreDestroy
public void shutdown() {
segmentClient.flush();
segmentClient.shutdown();
}
}
Usage in business logic:
@Service
public class PaymentService {
private final AnalyticsService analyticsService;
private final PaymentProcessor paymentProcessor;
public PaymentResult processPayment(PaymentRequest request, String userId) {
try {
PaymentResult result = paymentProcessor.process(request);
// Track successful payment
if (result.isSuccess()) {
analyticsService.trackEvent(userId, "payment_completed", Map.of(
"amount", request.getAmount(),
"currency", request.getCurrency(),
"paymentMethod", request.getPaymentMethod(),
"transactionId", result.getTransactionId()
));
} else {
// Track failed payment
analyticsService.trackEvent(userId, "payment_failed", Map.of(
"amount", request.getAmount(),
"currency", request.getCurrency(),
"paymentMethod", request.getPaymentMethod(),
"errorCode", result.getErrorCode(),
"errorMessage", result.getErrorMessage()
));
}
return result;
} catch (Exception e) {
// Track payment error
analyticsService.trackEvent(userId, "payment_error", Map.of(
"amount", request.getAmount(),
"currency", request.getCurrency(),
"errorType", e.getClass().getSimpleName()
));
throw e;
}
}
}
Why track backend events: Backend tracking captures system-initiated events (scheduled jobs, automated emails, background processing), provides reliable tracking that can't be blocked by ad blockers or browser settings, and ensures critical business events are tracked even if client-side tracking fails.
User Behavior Analysis
Funnels
Funnels measure conversion through multi-step processes by tracking the percentage of users who complete each step in a sequence:
// Example: Checkout funnel tracking
export const CheckoutFlow: React.FC = () => {
const [step, setStep] = useState<'cart' | 'shipping' | 'payment' | 'confirmation'>('cart');
useEffect(() => {
// Track funnel step views
trackEvent({
eventName: 'checkout_step_viewed',
properties: {
step: step,
stepNumber: getStepNumber(step),
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
}, [step]);
const handleStepComplete = (nextStep: typeof step) => {
// Track funnel step completion
trackEvent({
eventName: 'checkout_step_completed',
properties: {
step: step,
stepNumber: getStepNumber(step),
nextStep: nextStep,
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
setStep(nextStep);
};
return (
// ... checkout UI
);
};
function getStepNumber(step: string): number {
const steps = ['cart', 'shipping', 'payment', 'confirmation'];
return steps.indexOf(step) + 1;
}
Analyzing funnel drop-off identifies friction points in user flows. If 80% of users complete the shipping step but only 40% complete payment, investigate payment UI/UX issues.
The visualization shows significant drop-off at the payment step (50% abandonment), indicating optimization opportunities.
User Journeys
User journeys map the paths users take through your application, revealing how they discover and use features:
// Backend journey tracking
@Service
public class UserJourneyService {
private final AnalyticsService analyticsService;
public void trackUserAction(String userId, String action, String screen, Map<String, Object> context) {
analyticsService.trackEvent(userId, "user_action", Map.of(
"action", action,
"screen", screen,
"sequence", getUserActionSequence(userId),
"sessionDuration", getSessionDuration(userId),
"context", context
));
}
private int getUserActionSequence(String userId) {
// Return count of actions in current session
return sessionService.getActionCount(userId);
}
private long getSessionDuration(String userId) {
// Return duration in seconds since session start
return sessionService.getDurationSeconds(userId);
}
}
Journey analysis questions:
- What paths do successful users take vs. churned users?
- Are users discovering key features organically or do they need prompting?
- Which features lead to higher engagement/conversion?
- Where do users experience friction or confusion?
Retention Cohorts
Cohort analysis groups users by acquisition date and tracks their return behavior over time:
// Track user return visits
export function trackUserSession(userId: string, isNewUser: boolean) {
trackEvent({
eventName: 'session_started',
userId: userId,
properties: {
isNewUser: isNewUser,
daysSinceSignup: getDaysSinceSignup(userId),
previousSessionCount: getPreviousSessionCount(userId),
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
}
Retention metrics indicate product-market fit:
- Day 1 retention: What % of users return the day after signup?
- Week 1 retention: What % return within the first week?
- Month 1 retention: What % are still active after a month?
Good retention benchmarks vary by product type:
- Social networks: 65%+ day 1, 40%+ week 1
- Productivity tools: 50%+ day 1, 30%+ week 1
- E-commerce: 30%+ day 1, 20%+ week 1
Low retention indicates users don't find value in your product. Focus on improving onboarding, time-to-value, and core feature engagement before scaling acquisition.
Session Recording and Heatmaps
Session recording tools (Hotjar, FullStory, LogRocket) capture user interactions for qualitative analysis:
// Initialize session recording
import * as FullStory from '@fullstory/browser';
FullStory.init({
orgId: process.env.REACT_APP_FULLSTORY_ORG_ID!,
devMode: process.env.NODE_ENV === 'development'
});
// Identify user in recordings
export function identifyUserForRecording(userId: string, email: string, name: string) {
FullStory.identify(userId, {
email: email,
displayName: name
});
}
// Tag sessions with custom events
export function tagSession(eventName: string, properties: Record<string, any>) {
FullStory.event(eventName, properties);
}
Privacy considerations for session recording:
- Never record sensitive data - Automatically redact password fields, credit card numbers, personal information
- Obtain consent - Clearly inform users that sessions are recorded and provide opt-out
- Limit retention - Delete recordings after 30-90 days
- Restrict access - Only allow authorized team members to view recordings
// Mask sensitive form fields from recording
<input
type="password"
className="fs-exclude" // FullStory exclusion class
data-private="lipsum" // Hotjar redaction
/>
<input
type="text"
name="creditCard"
className="fs-exclude"
data-private="lipsum"
/>
Use cases for session recordings:
- Understand why users abandon funnels (watch sessions of drop-off users)
- Identify usability issues (see where users struggle or get confused)
- Validate hypotheses before A/B testing (qualitative research first)
- Debug reported issues (see exactly what the user did)
A/B Testing Implementation
A/B testing compares two or more variations of a feature to determine which performs better using statistical rigor.
Feature Flag-Based Testing
Implement A/B tests using feature flags to control variant assignment:
// Feature flag service with variant assignment
import { useFeatureFlag } from './featureFlags';
export const LoginPage: React.FC = () => {
// Assign user to test variant
const variant = useFeatureFlag('login-page-redesign'); // Returns 'control' or 'variant'
useEffect(() => {
// Track test exposure
trackEvent({
eventName: 'experiment_viewed',
properties: {
experimentName: 'login-page-redesign',
variant: variant,
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
}, [variant]);
const handleLogin = async (email: string, password: string) => {
// Track conversion event
trackEvent({
eventName: 'login_attempted',
properties: {
experimentName: 'login-page-redesign',
variant: variant,
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
const result = await authenticateUser(email, password);
if (result.success) {
trackEvent({
eventName: 'login_succeeded',
properties: {
experimentName: 'login-page-redesign',
variant: variant,
platform: 'web',
appVersion: process.env.REACT_APP_VERSION!,
locale: navigator.language
},
userId: result.userId,
anonymousId: getAnonymousId(),
sessionId: getSessionId(),
timestamp: new Date().toISOString()
});
}
};
return variant === 'variant' ? <NewLoginUI onLogin={handleLogin} /> : <OldLoginUI onLogin={handleLogin} />;
};
Variant Assignment Strategy
Consistent assignment ensures users always see the same variant:
// Deterministic variant assignment based on user ID
export function assignVariant(experimentName: string, userId: string): 'control' | 'variant' {
// Hash user ID + experiment name for deterministic assignment
const hash = hashCode(userId + experimentName);
const bucket = Math.abs(hash) % 100;
// 50/50 split
return bucket < 50 ? 'control' : 'variant';
}
function hashCode(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32-bit integer
}
return hash;
}
Why deterministic assignment matters: If variant assignment is random on each page load, users will see inconsistent experiences, invalidating test results. Hashing user ID ensures the same user always sees the same variant.
Statistical Significance
Don't end A/B tests prematurely. Statistical significance ensures results are not due to random chance:
# Calculate statistical significance (Python example)
from scipy import stats
def calculate_significance(control_conversions, control_total, variant_conversions, variant_total):
"""
Calculate statistical significance using two-proportion z-test
Returns p-value - if p < 0.05, results are statistically significant
"""
control_rate = control_conversions / control_total
variant_rate = variant_conversions / variant_total
# Calculate pooled proportion
pooled = (control_conversions + variant_conversions) / (control_total + variant_total)
# Calculate standard error
se = (pooled * (1 - pooled) * (1/control_total + 1/variant_total)) ** 0.5
# Calculate z-score
z = (variant_rate - control_rate) / se
# Calculate p-value (two-tailed test)
p_value = 2 * (1 - stats.norm.cdf(abs(z)))
return {
'control_rate': control_rate,
'variant_rate': variant_rate,
'relative_improvement': (variant_rate - control_rate) / control_rate,
'p_value': p_value,
'is_significant': p_value < 0.05
}
# Example usage
result = calculate_significance(
control_conversions=450,
control_total=10000,
variant_conversions=520,
variant_total=10000
)
print(f"Control conversion rate: {result['control_rate']:.2%}")
print(f"Variant conversion rate: {result['variant_rate']:.2%}")
print(f"Relative improvement: {result['relative_improvement']:.2%}")
print(f"P-value: {result['p_value']:.4f}")
print(f"Statistically significant: {result['is_significant']}")
Minimum sample size depends on baseline conversion rate and minimum detectable effect:
| Baseline Rate | Minimum Detectable Effect | Required Sample Size per Variant |
|---|---|---|
| 5% | 20% relative improvement | ~8,000 |
| 10% | 20% relative improvement | ~3,800 |
| 20% | 20% relative improvement | ~1,900 |
Common mistakes in A/B testing:
- Stopping tests too early - Wait for statistical significance and sufficient sample size
- Peeking at results repeatedly - Increases false positive rate; decide on sample size beforehand
- Testing too many variants - More variants require larger sample sizes; start with A/B, not A/B/C/D
- Ignoring guardrail metrics - Ensure winning variant doesn't hurt other important metrics
Multivariate Testing
Multivariate tests evaluate multiple changes simultaneously:
// Testing both headline and CTA button together
interface TestVariants {
headline: 'control' | 'variant_a' | 'variant_b';
ctaButton: 'control' | 'variant_a';
}
const variants = assignMultivariateVariants('landing-page-test', userId);
const headlines = {
control: 'Manage Your Finances',
variant_a: 'Take Control of Your Money',
variant_b: 'Financial Freedom Starts Here'
};
const ctaButtons = {
control: 'Sign Up',
variant_a: 'Get Started Free'
};
return (
<div>
<h1>{headlines[variants.headline]}</h1>
<button>{ctaButtons[variants.ctaButton]}</button>
</div>
);
Caution with multivariate tests: They require exponentially larger sample sizes. Testing 3 headlines × 2 CTA variants = 6 combinations, each needing sufficient sample size for statistical significance.
Data Warehouse Integration
ETL Pipelines
Extract, Transform, Load (ETL) pipelines move analytics data from operational systems into data warehouses for advanced analysis:
Event streaming to warehouses enables SQL-based analysis of user behavior:
# Segment warehouse destination configuration
destinations:
- type: bigquery
project: my-project
dataset: analytics
sync_schedule: "0 * * * *" # Hourly sync
events:
- product_viewed
- product_added_to_cart
- checkout_started
- payment_completed
identify_traits:
- email
- name
- created_at
- plan_type
Data Modeling
Transform raw events into business-friendly tables for analysis:
-- Create user activity summary table
CREATE OR REPLACE TABLE analytics.user_activity_summary AS
SELECT
user_id,
DATE(timestamp) as activity_date,
COUNT(*) as total_events,
COUNT(DISTINCT session_id) as session_count,
COUNTIF(event_name = 'product_viewed') as products_viewed,
COUNTIF(event_name = 'product_added_to_cart') as products_added_to_cart,
COUNTIF(event_name = 'payment_completed') as purchases,
SUM(IF(event_name = 'payment_completed', properties.amount, 0)) as revenue
FROM analytics.events
WHERE timestamp >= CURRENT_DATE() - INTERVAL 90 DAY
GROUP BY user_id, activity_date;
-- Create funnel conversion table
CREATE OR REPLACE TABLE analytics.checkout_funnel AS
WITH funnel_events AS (
SELECT
session_id,
user_id,
MAX(IF(event_name = 'product_viewed', 1, 0)) as viewed_product,
MAX(IF(event_name = 'product_added_to_cart', 1, 0)) as added_to_cart,
MAX(IF(event_name = 'checkout_started', 1, 0)) as started_checkout,
MAX(IF(event_name = 'payment_completed', 1, 0)) as completed_payment
FROM analytics.events
WHERE timestamp >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY session_id, user_id
)
SELECT
COUNT(*) as total_sessions,
SUM(viewed_product) as viewed_product_count,
SUM(added_to_cart) as added_to_cart_count,
SUM(started_checkout) as started_checkout_count,
SUM(completed_payment) as completed_payment_count,
SAFE_DIVIDE(SUM(added_to_cart), SUM(viewed_product)) as view_to_cart_rate,
SAFE_DIVIDE(SUM(started_checkout), SUM(added_to_cart)) as cart_to_checkout_rate,
SAFE_DIVIDE(SUM(completed_payment), SUM(started_checkout)) as checkout_to_payment_rate
FROM funnel_events;
Benefits of data warehouse analytics:
- SQL queries for ad-hoc analysis without vendor lock-in
- Join with operational data (combine analytics events with customer database)
- Historical analysis without vendor retention limits
- Custom reporting for business-specific metrics
- Machine learning on behavior data for predictions
Data Quality Monitoring
Data quality issues silently break dashboards and reports. Implement monitoring to catch problems early:
// Data quality checks for analytics pipeline
@Component
@Scheduled(cron = "0 0 * * * *") // Run hourly
public class AnalyticsDataQualityMonitor {
private final BigQueryClient bigQueryClient;
private final AlertService alertService;
public void checkDataQuality() {
// Check 1: Verify events are flowing
long recentEventCount = bigQueryClient.query(
"SELECT COUNT(*) FROM analytics.events WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)"
);
if (recentEventCount == 0) {
alertService.alert("No analytics events in past hour - pipeline may be broken");
}
// Check 2: Verify required properties are present
long eventsWithMissingProperties = bigQueryClient.query(
"SELECT COUNT(*) FROM analytics.events WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND (properties.user_id IS NULL OR properties.session_id IS NULL)"
);
if (eventsWithMissingProperties > 0) {
alertService.alert("Found " + eventsWithMissingProperties + " events with missing required properties");
}
// Check 3: Verify event volume is within expected range
long expectedMin = 1000;
long expectedMax = 50000;
if (recentEventCount < expectedMin || recentEventCount > expectedMax) {
alertService.alert("Unusual event volume: " + recentEventCount + " (expected " + expectedMin + "-" + expectedMax + ")");
}
// Check 4: Verify no data type mismatches
long typeErrors = bigQueryClient.query(
"SELECT COUNT(*) FROM analytics.events WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND SAFE_CAST(properties.amount AS FLOAT64) IS NULL AND properties.amount IS NOT NULL"
);
if (typeErrors > 0) {
alertService.alert("Found " + typeErrors + " events with data type mismatches");
}
}
}
Privacy and Compliance
GDPR and CCPA Compliance
Privacy regulations require obtaining consent, providing data transparency, and honoring deletion requests:
// Cookie consent management
import CookieConsent from 'react-cookie-consent';
export const App: React.FC = () => {
const [analyticsEnabled, setAnalyticsEnabled] = useState(false);
const handleAcceptCookies = () => {
setAnalyticsEnabled(true);
// Initialize analytics only after consent
initializeAnalytics();
};
const handleDeclineCookies = () => {
setAnalyticsEnabled(false);
// Disable analytics
disableAnalytics();
};
return (
<>
<CookieConsent
onAccept={handleAcceptCookies}
onDecline={handleDeclineCookies}
enableDeclineButton
>
We use cookies to improve your experience and analyze site usage.
</CookieConsent>
{/* App content */}
</>
);
};
function disableAnalytics() {
// Disable tracking for popular analytics tools
window['ga-disable-UA-XXXXXXX-X'] = true; // Google Analytics
// Opt out of Mixpanel tracking
if (window.mixpanel) {
window.mixpanel.opt_out_tracking();
}
}
Data Anonymization
Anonymize personally identifiable information (PII) to reduce privacy risk:
// Anonymize sensitive data before sending to analytics
@Service
public class PrivacyAwareAnalyticsService {
private final AnalyticsService analyticsService;
public void trackEvent(String userId, String eventName, Map<String, Object> properties) {
// Anonymize email addresses
if (properties.containsKey("email")) {
String email = (String) properties.get("email");
properties.put("emailDomain", extractDomain(email));
properties.put("emailHash", hashEmail(email));
properties.remove("email"); // Don't send actual email
}
// Anonymize IP addresses
if (properties.containsKey("ipAddress")) {
String ip = (String) properties.get("ipAddress");
properties.put("ipAddress", anonymizeIp(ip));
}
// Remove sensitive fields entirely
properties.remove("password");
properties.remove("creditCard");
properties.remove("ssn");
analyticsService.trackEvent(userId, eventName, properties);
}
private String extractDomain(String email) {
return email.substring(email.indexOf('@') + 1);
}
private String hashEmail(String email) {
// One-way hash for pseudonymization
return DigestUtils.sha256Hex(email);
}
private String anonymizeIp(String ip) {
// Remove last octet of IPv4 address
String[] parts = ip.split("\\.");
if (parts.length == 4) {
return parts[0] + "." + parts[1] + "." + parts[2] + ".0";
}
return ip;
}
}
Right to be Forgotten
Implement data deletion to comply with GDPR Article 17 (Right to Erasure):
// Delete user data from analytics systems
@Service
public class UserDataDeletionService {
private final AnalyticsService analyticsService;
private final BigQueryClient bigQueryClient;
@Transactional
public void deleteUserData(String userId) {
// Delete from analytics provider
analyticsService.deleteUser(userId);
// Delete from data warehouse
bigQueryClient.execute(
"DELETE FROM analytics.events WHERE user_id = ?",
userId
);
bigQueryClient.execute(
"DELETE FROM analytics.user_profiles WHERE user_id = ?",
userId
);
// Delete from session recording tools
fullStoryClient.deleteUser(userId);
log.info("Deleted all analytics data for user: {}", userId);
}
}
Data retention policies automatically delete old data:
-- BigQuery automatic deletion after 2 years
ALTER TABLE analytics.events
SET OPTIONS (
partition_expiration_days = 730 -- 2 years
);
Differential Privacy
Differential privacy adds mathematical noise to aggregate data to prevent identifying individual users:
# Add differential privacy noise to aggregate metrics
import numpy as np
def add_laplace_noise(true_value, sensitivity, epsilon):
"""
Add Laplace noise for differential privacy
Args:
true_value: Actual metric value
sensitivity: Maximum change one individual can cause
epsilon: Privacy parameter (smaller = more privacy, more noise)
Returns:
Noised value
"""
scale = sensitivity / epsilon
noise = np.random.laplace(0, scale)
return true_value + noise
# Example: Report conversion rate with privacy
true_conversions = 1523
total_users = 10000
# Add noise to maintain privacy
noised_conversions = add_laplace_noise(
true_value=true_conversions,
sensitivity=1, # One user can change count by at most 1
epsilon=0.1 # Strong privacy guarantee
)
conversion_rate = noised_conversions / total_users
print(f"Differentially private conversion rate: {conversion_rate:.2%}")
This technique is used by large platforms (Apple, Google) to collect usage statistics while preserving individual privacy.
Real-Time vs. Batch Analytics
Real-Time Analytics
Real-time (streaming) analytics provide immediate insights as events occur:
// Real-time event processing with Kafka Streams
@Service
public class RealTimeAnalyticsProcessor {
@Bean
public KStream<String, AnalyticsEvent> processEvents(StreamsBuilder builder) {
KStream<String, AnalyticsEvent> events = builder.stream("analytics-events");
// Aggregate events in real-time windows
events
.groupBy((key, event) -> event.getEventName())
.windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
.count()
.toStream()
.foreach((windowedKey, count) -> {
String eventName = windowedKey.key();
long windowStart = windowedKey.window().start();
// Push to real-time dashboard
dashboardService.updateMetric(eventName, count, windowStart);
// Alert on anomalies
if (isAnomalous(eventName, count)) {
alertService.alert("Unusual activity: " + eventName + " count = " + count);
}
});
return events;
}
private boolean isAnomalous(String eventName, long count) {
// Compare to historical baseline
long baseline = metricsService.getBaseline(eventName);
return count > baseline * 2 || count < baseline * 0.5;
}
}
Use cases for real-time analytics:
- Operational dashboards - Monitor system health and user activity live
- Anomaly detection - Alert on sudden traffic spikes or drops
- Real-time personalization - Adjust content based on current user behavior
- Fraud detection - Flag suspicious patterns immediately
Batch Analytics
Batch analytics process large volumes of historical data on a schedule:
// Scheduled batch job for complex analytics
@Service
public class BatchAnalyticsJob {
@Scheduled(cron = "0 0 2 * * *") // Run at 2 AM daily
public void runDailyAnalytics() {
log.info("Starting daily analytics batch job");
// Complex cohort analysis requiring full historical data
List<CohortMetrics> cohorts = calculateCohortRetention();
// Generate business reports
generateRevenueReport();
generateUserGrowthReport();
generateProductPerformanceReport();
// Update materialized views in data warehouse
refreshMaterializedViews();
log.info("Completed daily analytics batch job");
}
private List<CohortMetrics> calculateCohortRetention() {
// Query data warehouse for cohort analysis
return bigQueryClient.query("""
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC(MIN(DATE(timestamp)), WEEK) as cohort_week
FROM analytics.events
GROUP BY user_id
),
cohort_activity AS (
SELECT
c.cohort_week,
DATE_TRUNC(DATE(e.timestamp), WEEK) as activity_week,
COUNT(DISTINCT e.user_id) as active_users
FROM user_cohorts c
JOIN analytics.events e ON c.user_id = e.user_id
GROUP BY c.cohort_week, activity_week
)
SELECT
cohort_week,
activity_week,
active_users,
active_users / FIRST_VALUE(active_users) OVER (
PARTITION BY cohort_week ORDER BY activity_week
) as retention_rate
FROM cohort_activity
ORDER BY cohort_week, activity_week
""");
}
}
Use cases for batch analytics:
- Complex historical analysis - Cohort retention, lifetime value calculations
- Machine learning training - Train models on historical behavior data
- Business reporting - Monthly revenue reports, quarterly growth metrics
- Data warehouse maintenance - Rebuild aggregated tables, clean up old data
Lambda architecture combines both approaches:
Batch layer provides accuracy and completeness, speed layer provides low latency, and query service merges results from both.
Related Guidelines
- Feature Flags - Using feature flags for A/B testing and gradual rollouts
- Observability - Technical monitoring vs. product analytics
- Logging - Structured logging for operational analytics
- Performance Testing - Measuring application performance metrics
- Security Testing - GDPR compliance testing