Analytics and Business Intelligence

Overview

Analytics systems capture, process, and analyze user behavior data to inform product decisions, measure business outcomes, and improve user experiences. Modern analytics implementations must balance comprehensive data collection with user privacy, provide both real-time insights and historical analysis, and integrate with data warehouses for advanced business intelligence.

This guide covers event tracking strategies, analytics platform selection, implementing A/B testing frameworks, privacy considerations, and integrating analytics data with data warehouses for business intelligence reporting.

Core Principles

Define events with intention - Every tracked event should serve a specific decision-making purpose; avoid tracking "just in case"
Prioritize user privacy - Collect only necessary data, implement consent mechanisms, anonymize sensitive information, and provide opt-out capabilities
Maintain event consistency - Establish naming conventions and schemas upfront to prevent fragmented, unmaintainable analytics implementations
Separate collection from analysis - Use event streaming and data warehouses to enable multiple analytics use cases from a single data collection layer
Test your tracking - Implement automated tests for critical events to catch tracking breakages before they reach production

Analytics Implementation Strategies

Choosing an Analytics Platform

Different platforms serve different use cases and organizational needs:

Product Analytics Platforms focus on user behavior and product metrics:

Mixpanel - User-centric analytics, retention cohorts, funnel analysis, event-based pricing
Amplitude - Product analytics with behavioral cohorts, path analysis, session replay, predictive analytics
Heap - Autocapture all events then define retroactively, visual labeling, no code changes for new events

Web Analytics Platforms focus on traffic and acquisition:

Google Analytics 4 - Free (with limits), event-based model, integration with Google Ads, machine learning insights
Plausible/Fathom - Privacy-focused, lightweight, no cookies, GDPR-compliant by default

Customer Data Platforms unify data from multiple sources:

Segment - Customer data infrastructure, routes events to multiple destinations, source of truth for user data
RudderStack - Open-source alternative to Segment, warehouse-first architecture, data governance

Key selection criteria:

Pricing model - Event volume, user count, or feature-based pricing
Data ownership - Can you export raw event data? Is there a data warehouse integration?
Privacy compliance - GDPR/CCPA features, data retention controls, consent management
Real-time vs. batch - Does your use case require real-time dashboards or is daily batch processing sufficient?
Technical integration - SDKs for your tech stack, API availability, custom event properties

For applications requiring both product analytics and extensive custom analysis, use a Customer Data Platform (Segment, RudderStack) to send events to multiple destinations simultaneously:

This architecture decouples event collection from consumption, allowing you to add new analytics destinations without changing application code.

Event Tracking Strategy

Event taxonomy is the foundation of maintainable analytics. Poor taxonomy leads to duplicate events, inconsistent naming, and fragmented data that's difficult to query.

Establish naming conventions before implementing tracking:

// Event naming convention: Object + Action
// Examples:
// - user_signed_up
// - product_viewed
// - payment_completed
// - video_played
// - form_submitted

interface BaseEvent {
  eventName: string;          // e.g., "product_viewed"
  timestamp: string;          // ISO 8601 timestamp
  userId?: string;            // Identified user ID
  anonymousId: string;        // Anonymous session ID
  sessionId: string;          // Session identifier
  properties: EventProperties;
}

interface EventProperties {
  // Common properties for all events
  platform: 'web' | 'ios' | 'android';
  appVersion: string;
  locale: string;

  // Event-specific properties
  [key: string]: any;
}

// Example: Product view event
interface ProductViewedEvent extends BaseEvent {
  eventName: 'product_viewed';
  properties: {
    productId: string;
    productName: string;
    category: string;
    price: number;
    currency: string;
    platform: 'web' | 'ios' | 'android';
    appVersion: string;
    locale: string;
  };
}

Benefits of strong typing: TypeScript interfaces prevent typos in event names and property keys, provide autocomplete for event tracking calls, and serve as documentation for the analytics schema.

Implementing Event Tracking

Centralized tracking layer abstracts the analytics provider and enforces consistent event structure:

// analytics.ts - Centralized analytics service
import * as Sentry from '@sentry/react';
import Analytics from 'analytics';
import segmentPlugin from '@analytics/segment';

// Initialize analytics with plugins
const analytics = Analytics({
  app: 'my-app',
  plugins: [
    segmentPlugin({
      writeKey: process.env.REACT_APP_SEGMENT_WRITE_KEY
    })
  ]
});

// Type-safe event tracking
export function trackEvent<T extends BaseEvent>(event: T): void {
  try {
    // Validate event structure
    if (!event.eventName) {
      throw new Error('Event name is required');
    }

    // Add automatic properties
    const enrichedEvent = {
      ...event,
      properties: {
        ...event.properties,
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION,
        locale: navigator.language,
        timestamp: new Date().toISOString()
      }
    };

    // Send to analytics provider
    analytics.track(enrichedEvent.eventName, enrichedEvent.properties);

    // Log in development
    if (process.env.NODE_ENV === 'development') {
      console.log('[Analytics]', enrichedEvent.eventName, enrichedEvent.properties);
    }
  } catch (error) {
    // Don't let analytics errors crash the app
    console.error('Analytics tracking failed:', error);
    Sentry.captureException(error);
  }
}

// Identify user
export function identifyUser(userId: string, traits?: Record<string, any>): void {
  analytics.identify(userId, traits);
}

// Track page views
export function trackPageView(pageName: string, properties?: Record<string, any>): void {
  analytics.page(pageName, properties);
}

Usage in components:

// ProductPage.tsx
import { useEffect } from 'react';
import { trackEvent } from './analytics';

export const ProductPage: React.FC<{ product: Product }> = ({ product }) => {
  useEffect(() => {
    // Track product view
    trackEvent({
      eventName: 'product_viewed',
      properties: {
        productId: product.id,
        productName: product.name,
        category: product.category,
        price: product.price,
        currency: 'USD',
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION!,
        locale: navigator.language
      },
      anonymousId: getAnonymousId(),
      sessionId: getSessionId(),
      timestamp: new Date().toISOString()
    });
  }, [product.id]);

  const handleAddToCart = () => {
    // Track add to cart action
    trackEvent({
      eventName: 'product_added_to_cart',
      properties: {
        productId: product.id,
        productName: product.name,
        price: product.price,
        quantity: 1,
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION!,
        locale: navigator.language
      },
      anonymousId: getAnonymousId(),
      sessionId: getSessionId(),
      timestamp: new Date().toISOString()
    });

    addToCart(product);
  };

  return (
    <div>
      <h1>{product.name}</h1>
      <button onClick={handleAddToCart}>Add to Cart</button>
    </div>
  );
};

Server-Side Event Tracking

Not all events originate from client applications. Track backend events for API usage, background jobs, and system-initiated actions:

// Spring Boot analytics service
@Service
public class AnalyticsService {

    private final SegmentClient segmentClient;
    private final ObjectMapper objectMapper;

    public AnalyticsService(
            @Value("${segment.writeKey}") String writeKey,
            ObjectMapper objectMapper) {
        this.segmentClient = Analytics.builder(writeKey).build();
        this.objectMapper = objectMapper;
    }

    public void trackEvent(String userId, String eventName, Map<String, Object> properties) {
        try {
            // Add automatic properties
            Map<String, Object> enrichedProperties = new HashMap<>(properties);
            enrichedProperties.put("platform", "backend");
            enrichedProperties.put("appVersion", getApplicationVersion());
            enrichedProperties.put("timestamp", Instant.now().toString());

            // Send to Segment
            TrackMessage message = TrackMessage.builder(eventName)
                .userId(userId)
                .properties(enrichedProperties)
                .build();

            segmentClient.enqueue(message);

            log.debug("Tracked event: {} for user: {}", eventName, userId);
        } catch (Exception e) {
            // Don't let analytics failures affect business logic
            log.error("Failed to track event: {}", eventName, e);
        }
    }

    public void identifyUser(String userId, Map<String, Object> traits) {
        IdentifyMessage message = IdentifyMessage.builder()
            .userId(userId)
            .traits(traits)
            .build();

        segmentClient.enqueue(message);
    }

    @PreDestroy
    public void shutdown() {
        segmentClient.flush();
        segmentClient.shutdown();
    }
}

Usage in business logic:

@Service
public class PaymentService {

    private final AnalyticsService analyticsService;
    private final PaymentProcessor paymentProcessor;

    public PaymentResult processPayment(PaymentRequest request, String userId) {
        try {
            PaymentResult result = paymentProcessor.process(request);

            // Track successful payment
            if (result.isSuccess()) {
                analyticsService.trackEvent(userId, "payment_completed", Map.of(
                    "amount", request.getAmount(),
                    "currency", request.getCurrency(),
                    "paymentMethod", request.getPaymentMethod(),
                    "transactionId", result.getTransactionId()
                ));
            } else {
                // Track failed payment
                analyticsService.trackEvent(userId, "payment_failed", Map.of(
                    "amount", request.getAmount(),
                    "currency", request.getCurrency(),
                    "paymentMethod", request.getPaymentMethod(),
                    "errorCode", result.getErrorCode(),
                    "errorMessage", result.getErrorMessage()
                ));
            }

            return result;
        } catch (Exception e) {
            // Track payment error
            analyticsService.trackEvent(userId, "payment_error", Map.of(
                "amount", request.getAmount(),
                "currency", request.getCurrency(),
                "errorType", e.getClass().getSimpleName()
            ));

            throw e;
        }
    }
}

Why track backend events: Backend tracking captures system-initiated events (scheduled jobs, automated emails, background processing), provides reliable tracking that can't be blocked by ad blockers or browser settings, and ensures critical business events are tracked even if client-side tracking fails.

User Behavior Analysis

Funnels

Funnels measure conversion through multi-step processes by tracking the percentage of users who complete each step in a sequence:

// Example: Checkout funnel tracking
export const CheckoutFlow: React.FC = () => {
  const [step, setStep] = useState<'cart' | 'shipping' | 'payment' | 'confirmation'>('cart');

  useEffect(() => {
    // Track funnel step views
    trackEvent({
      eventName: 'checkout_step_viewed',
      properties: {
        step: step,
        stepNumber: getStepNumber(step),
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION!,
        locale: navigator.language
      },
      anonymousId: getAnonymousId(),
      sessionId: getSessionId(),
      timestamp: new Date().toISOString()
    });
  }, [step]);

  const handleStepComplete = (nextStep: typeof step) => {
    // Track funnel step completion
    trackEvent({
      eventName: 'checkout_step_completed',
      properties: {
        step: step,
        stepNumber: getStepNumber(step),
        nextStep: nextStep,
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION!,
        locale: navigator.language
      },
      anonymousId: getAnonymousId(),
      sessionId: getSessionId(),
      timestamp: new Date().toISOString()
    });

    setStep(nextStep);
  };

  return (
    // ... checkout UI
  );
};

function getStepNumber(step: string): number {
  const steps = ['cart', 'shipping', 'payment', 'confirmation'];
  return steps.indexOf(step) + 1;
}

Analyzing funnel drop-off identifies friction points in user flows. If 80% of users complete the shipping step but only 40% complete payment, investigate payment UI/UX issues.

The visualization shows significant drop-off at the payment step (50% abandonment), indicating optimization opportunities.

User Journeys

User journeys map the paths users take through your application, revealing how they discover and use features:

// Backend journey tracking
@Service
public class UserJourneyService {

    private final AnalyticsService analyticsService;

    public void trackUserAction(String userId, String action, String screen, Map<String, Object> context) {
        analyticsService.trackEvent(userId, "user_action", Map.of(
            "action", action,
            "screen", screen,
            "sequence", getUserActionSequence(userId),
            "sessionDuration", getSessionDuration(userId),
            "context", context
        ));
    }

    private int getUserActionSequence(String userId) {
        // Return count of actions in current session
        return sessionService.getActionCount(userId);
    }

    private long getSessionDuration(String userId) {
        // Return duration in seconds since session start
        return sessionService.getDurationSeconds(userId);
    }
}

Journey analysis questions:

What paths do successful users take vs. churned users?
Are users discovering key features organically or do they need prompting?
Which features lead to higher engagement/conversion?
Where do users experience friction or confusion?

Retention Cohorts

Cohort analysis groups users by acquisition date and tracks their return behavior over time:

// Track user return visits
export function trackUserSession(userId: string, isNewUser: boolean) {
  trackEvent({
    eventName: 'session_started',
    userId: userId,
    properties: {
      isNewUser: isNewUser,
      daysSinceSignup: getDaysSinceSignup(userId),
      previousSessionCount: getPreviousSessionCount(userId),
      platform: 'web',
      appVersion: process.env.REACT_APP_VERSION!,
      locale: navigator.language
    },
    anonymousId: getAnonymousId(),
    sessionId: getSessionId(),
    timestamp: new Date().toISOString()
  });
}

Retention metrics indicate product-market fit:

Day 1 retention: What % of users return the day after signup?
Week 1 retention: What % return within the first week?
Month 1 retention: What % are still active after a month?

Good retention benchmarks vary by product type:

Social networks: 65%+ day 1, 40%+ week 1
Productivity tools: 50%+ day 1, 30%+ week 1
E-commerce: 30%+ day 1, 20%+ week 1

Low retention indicates users don't find value in your product. Focus on improving onboarding, time-to-value, and core feature engagement before scaling acquisition.

Session Recording and Heatmaps

Session recording tools (Hotjar, FullStory, LogRocket) capture user interactions for qualitative analysis:

// Initialize session recording
import * as FullStory from '@fullstory/browser';

FullStory.init({
  orgId: process.env.REACT_APP_FULLSTORY_ORG_ID!,
  devMode: process.env.NODE_ENV === 'development'
});

// Identify user in recordings
export function identifyUserForRecording(userId: string, email: string, name: string) {
  FullStory.identify(userId, {
    email: email,
    displayName: name
  });
}

// Tag sessions with custom events
export function tagSession(eventName: string, properties: Record<string, any>) {
  FullStory.event(eventName, properties);
}

Privacy considerations for session recording:

Never record sensitive data - Automatically redact password fields, credit card numbers, personal information
Obtain consent - Clearly inform users that sessions are recorded and provide opt-out
Limit retention - Delete recordings after 30-90 days
Restrict access - Only allow authorized team members to view recordings

// Mask sensitive form fields from recording
<input
  type="password"
  className="fs-exclude"  // FullStory exclusion class
  data-private="lipsum"    // Hotjar redaction
/>

<input
  type="text"
  name="creditCard"
  className="fs-exclude"
  data-private="lipsum"
/>

Use cases for session recordings:

Understand why users abandon funnels (watch sessions of drop-off users)
Identify usability issues (see where users struggle or get confused)
Validate hypotheses before A/B testing (qualitative research first)
Debug reported issues (see exactly what the user did)

A/B Testing Implementation

A/B testing compares two or more variations of a feature to determine which performs better using statistical rigor.

Feature Flag-Based Testing

Implement A/B tests using feature flags to control variant assignment:

// Feature flag service with variant assignment
import { useFeatureFlag } from './featureFlags';

export const LoginPage: React.FC = () => {
  // Assign user to test variant
  const variant = useFeatureFlag('login-page-redesign');  // Returns 'control' or 'variant'

  useEffect(() => {
    // Track test exposure
    trackEvent({
      eventName: 'experiment_viewed',
      properties: {
        experimentName: 'login-page-redesign',
        variant: variant,
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION!,
        locale: navigator.language
      },
      anonymousId: getAnonymousId(),
      sessionId: getSessionId(),
      timestamp: new Date().toISOString()
    });
  }, [variant]);

  const handleLogin = async (email: string, password: string) => {
    // Track conversion event
    trackEvent({
      eventName: 'login_attempted',
      properties: {
        experimentName: 'login-page-redesign',
        variant: variant,
        platform: 'web',
        appVersion: process.env.REACT_APP_VERSION!,
        locale: navigator.language
      },
      anonymousId: getAnonymousId(),
      sessionId: getSessionId(),
      timestamp: new Date().toISOString()
    });

    const result = await authenticateUser(email, password);

    if (result.success) {
      trackEvent({
        eventName: 'login_succeeded',
        properties: {
          experimentName: 'login-page-redesign',
          variant: variant,
          platform: 'web',
          appVersion: process.env.REACT_APP_VERSION!,
          locale: navigator.language
        },
        userId: result.userId,
        anonymousId: getAnonymousId(),
        sessionId: getSessionId(),
        timestamp: new Date().toISOString()
      });
    }
  };

  return variant === 'variant' ? <NewLoginUI onLogin={handleLogin} /> : <OldLoginUI onLogin={handleLogin} />;
};

Variant Assignment Strategy

Consistent assignment ensures users always see the same variant:

// Deterministic variant assignment based on user ID
export function assignVariant(experimentName: string, userId: string): 'control' | 'variant' {
  // Hash user ID + experiment name for deterministic assignment
  const hash = hashCode(userId + experimentName);
  const bucket = Math.abs(hash) % 100;

  // 50/50 split
  return bucket < 50 ? 'control' : 'variant';
}

function hashCode(str: string): number {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i);
    hash = ((hash << 5) - hash) + char;
    hash = hash & hash; // Convert to 32-bit integer
  }
  return hash;
}

Why deterministic assignment matters: If variant assignment is random on each page load, users will see inconsistent experiences, invalidating test results. Hashing user ID ensures the same user always sees the same variant.

Statistical Significance

Don't end A/B tests prematurely. Statistical significance ensures results are not due to random chance:

# Calculate statistical significance (Python example)
from scipy import stats

def calculate_significance(control_conversions, control_total, variant_conversions, variant_total):
    """
    Calculate statistical significance using two-proportion z-test

    Returns p-value - if p < 0.05, results are statistically significant
    """
    control_rate = control_conversions / control_total
    variant_rate = variant_conversions / variant_total

    # Calculate pooled proportion
    pooled = (control_conversions + variant_conversions) / (control_total + variant_total)

    # Calculate standard error
    se = (pooled * (1 - pooled) * (1/control_total + 1/variant_total)) ** 0.5

    # Calculate z-score
    z = (variant_rate - control_rate) / se

    # Calculate p-value (two-tailed test)
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))

    return {
        'control_rate': control_rate,
        'variant_rate': variant_rate,
        'relative_improvement': (variant_rate - control_rate) / control_rate,
        'p_value': p_value,
        'is_significant': p_value < 0.05
    }

# Example usage
result = calculate_significance(
    control_conversions=450,
    control_total=10000,
    variant_conversions=520,
    variant_total=10000
)

print(f"Control conversion rate: {result['control_rate']:.2%}")
print(f"Variant conversion rate: {result['variant_rate']:.2%}")
print(f"Relative improvement: {result['relative_improvement']:.2%}")
print(f"P-value: {result['p_value']:.4f}")
print(f"Statistically significant: {result['is_significant']}")

Minimum sample size depends on baseline conversion rate and minimum detectable effect:

Baseline Rate	Minimum Detectable Effect	Required Sample Size per Variant
5%	20% relative improvement	~8,000
10%	20% relative improvement	~3,800
20%	20% relative improvement	~1,900

Common mistakes in A/B testing:

Stopping tests too early - Wait for statistical significance and sufficient sample size
Peeking at results repeatedly - Increases false positive rate; decide on sample size beforehand
Testing too many variants - More variants require larger sample sizes; start with A/B, not A/B/C/D
Ignoring guardrail metrics - Ensure winning variant doesn't hurt other important metrics

Multivariate Testing

Multivariate tests evaluate multiple changes simultaneously:

// Testing both headline and CTA button together
interface TestVariants {
  headline: 'control' | 'variant_a' | 'variant_b';
  ctaButton: 'control' | 'variant_a';
}

const variants = assignMultivariateVariants('landing-page-test', userId);

const headlines = {
  control: 'Manage Your Finances',
  variant_a: 'Take Control of Your Money',
  variant_b: 'Financial Freedom Starts Here'
};

const ctaButtons = {
  control: 'Sign Up',
  variant_a: 'Get Started Free'
};

return (
  <div>
    <h1>{headlines[variants.headline]}</h1>
    <button>{ctaButtons[variants.ctaButton]}</button>
  </div>
);

Caution with multivariate tests: They require exponentially larger sample sizes. Testing 3 headlines × 2 CTA variants = 6 combinations, each needing sufficient sample size for statistical significance.

Data Warehouse Integration

ETL Pipelines

Extract, Transform, Load (ETL) pipelines move analytics data from operational systems into data warehouses for advanced analysis:

Event streaming to warehouses enables SQL-based analysis of user behavior:

# Segment warehouse destination configuration
destinations:
  - type: bigquery
    project: my-project
    dataset: analytics
    sync_schedule: "0 * * * *"  # Hourly sync
    events:
      - product_viewed
      - product_added_to_cart
      - checkout_started
      - payment_completed
    identify_traits:
      - email
      - name
      - created_at
      - plan_type

Data Modeling

Transform raw events into business-friendly tables for analysis:

-- Create user activity summary table
CREATE OR REPLACE TABLE analytics.user_activity_summary AS
SELECT
  user_id,
  DATE(timestamp) as activity_date,
  COUNT(*) as total_events,
  COUNT(DISTINCT session_id) as session_count,
  COUNTIF(event_name = 'product_viewed') as products_viewed,
  COUNTIF(event_name = 'product_added_to_cart') as products_added_to_cart,
  COUNTIF(event_name = 'payment_completed') as purchases,
  SUM(IF(event_name = 'payment_completed', properties.amount, 0)) as revenue
FROM analytics.events
WHERE timestamp >= CURRENT_DATE() - INTERVAL 90 DAY
GROUP BY user_id, activity_date;

-- Create funnel conversion table
CREATE OR REPLACE TABLE analytics.checkout_funnel AS
WITH funnel_events AS (
  SELECT
    session_id,
    user_id,
    MAX(IF(event_name = 'product_viewed', 1, 0)) as viewed_product,
    MAX(IF(event_name = 'product_added_to_cart', 1, 0)) as added_to_cart,
    MAX(IF(event_name = 'checkout_started', 1, 0)) as started_checkout,
    MAX(IF(event_name = 'payment_completed', 1, 0)) as completed_payment
  FROM analytics.events
  WHERE timestamp >= CURRENT_DATE() - INTERVAL 30 DAY
  GROUP BY session_id, user_id
)
SELECT
  COUNT(*) as total_sessions,
  SUM(viewed_product) as viewed_product_count,
  SUM(added_to_cart) as added_to_cart_count,
  SUM(started_checkout) as started_checkout_count,
  SUM(completed_payment) as completed_payment_count,
  SAFE_DIVIDE(SUM(added_to_cart), SUM(viewed_product)) as view_to_cart_rate,
  SAFE_DIVIDE(SUM(started_checkout), SUM(added_to_cart)) as cart_to_checkout_rate,
  SAFE_DIVIDE(SUM(completed_payment), SUM(started_checkout)) as checkout_to_payment_rate
FROM funnel_events;

Benefits of data warehouse analytics:

SQL queries for ad-hoc analysis without vendor lock-in
Join with operational data (combine analytics events with customer database)
Historical analysis without vendor retention limits
Custom reporting for business-specific metrics
Machine learning on behavior data for predictions

Data Quality Monitoring

Data quality issues silently break dashboards and reports. Implement monitoring to catch problems early:

// Data quality checks for analytics pipeline
@Component
@Scheduled(cron = "0 0 * * * *")  // Run hourly
public class AnalyticsDataQualityMonitor {

    private final BigQueryClient bigQueryClient;
    private final AlertService alertService;

    public void checkDataQuality() {
        // Check 1: Verify events are flowing
        long recentEventCount = bigQueryClient.query(
            "SELECT COUNT(*) FROM analytics.events WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)"
        );

        if (recentEventCount == 0) {
            alertService.alert("No analytics events in past hour - pipeline may be broken");
        }

        // Check 2: Verify required properties are present
        long eventsWithMissingProperties = bigQueryClient.query(
            "SELECT COUNT(*) FROM analytics.events WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND (properties.user_id IS NULL OR properties.session_id IS NULL)"
        );

        if (eventsWithMissingProperties > 0) {
            alertService.alert("Found " + eventsWithMissingProperties + " events with missing required properties");
        }

        // Check 3: Verify event volume is within expected range
        long expectedMin = 1000;
        long expectedMax = 50000;

        if (recentEventCount < expectedMin || recentEventCount > expectedMax) {
            alertService.alert("Unusual event volume: " + recentEventCount + " (expected " + expectedMin + "-" + expectedMax + ")");
        }

        // Check 4: Verify no data type mismatches
        long typeErrors = bigQueryClient.query(
            "SELECT COUNT(*) FROM analytics.events WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND SAFE_CAST(properties.amount AS FLOAT64) IS NULL AND properties.amount IS NOT NULL"
        );

        if (typeErrors > 0) {
            alertService.alert("Found " + typeErrors + " events with data type mismatches");
        }
    }
}

Privacy and Compliance

Privacy regulations require obtaining consent, providing data transparency, and honoring deletion requests:

// Cookie consent management
import CookieConsent from 'react-cookie-consent';

export const App: React.FC = () => {
  const [analyticsEnabled, setAnalyticsEnabled] = useState(false);

  const handleAcceptCookies = () => {
    setAnalyticsEnabled(true);

    // Initialize analytics only after consent
    initializeAnalytics();
  };

  const handleDeclineCookies = () => {
    setAnalyticsEnabled(false);

    // Disable analytics
    disableAnalytics();
  };

  return (
    <>
      <CookieConsent
        onAccept={handleAcceptCookies}
        onDecline={handleDeclineCookies}
        enableDeclineButton
      >
        We use cookies to improve your experience and analyze site usage.
      </CookieConsent>

      {/* App content */}
    </>
  );
};

function disableAnalytics() {
  // Disable tracking for popular analytics tools
  window['ga-disable-UA-XXXXXXX-X'] = true;  // Google Analytics

  // Opt out of Mixpanel tracking
  if (window.mixpanel) {
    window.mixpanel.opt_out_tracking();
  }
}

Data Anonymization

Anonymize personally identifiable information (PII) to reduce privacy risk:

// Anonymize sensitive data before sending to analytics
@Service
public class PrivacyAwareAnalyticsService {

    private final AnalyticsService analyticsService;

    public void trackEvent(String userId, String eventName, Map<String, Object> properties) {
        // Anonymize email addresses
        if (properties.containsKey("email")) {
            String email = (String) properties.get("email");
            properties.put("emailDomain", extractDomain(email));
            properties.put("emailHash", hashEmail(email));
            properties.remove("email");  // Don't send actual email
        }

        // Anonymize IP addresses
        if (properties.containsKey("ipAddress")) {
            String ip = (String) properties.get("ipAddress");
            properties.put("ipAddress", anonymizeIp(ip));
        }

        // Remove sensitive fields entirely
        properties.remove("password");
        properties.remove("creditCard");
        properties.remove("ssn");

        analyticsService.trackEvent(userId, eventName, properties);
    }

    private String extractDomain(String email) {
        return email.substring(email.indexOf('@') + 1);
    }

    private String hashEmail(String email) {
        // One-way hash for pseudonymization
        return DigestUtils.sha256Hex(email);
    }

    private String anonymizeIp(String ip) {
        // Remove last octet of IPv4 address
        String[] parts = ip.split("\\.");
        if (parts.length == 4) {
            return parts[0] + "." + parts[1] + "." + parts[2] + ".0";
        }
        return ip;
    }
}

Right to be Forgotten

Implement data deletion to comply with GDPR Article 17 (Right to Erasure):

// Delete user data from analytics systems
@Service
public class UserDataDeletionService {

    private final AnalyticsService analyticsService;
    private final BigQueryClient bigQueryClient;

    @Transactional
    public void deleteUserData(String userId) {
        // Delete from analytics provider
        analyticsService.deleteUser(userId);

        // Delete from data warehouse
        bigQueryClient.execute(
            "DELETE FROM analytics.events WHERE user_id = ?",
            userId
        );

        bigQueryClient.execute(
            "DELETE FROM analytics.user_profiles WHERE user_id = ?",
            userId
        );

        // Delete from session recording tools
        fullStoryClient.deleteUser(userId);

        log.info("Deleted all analytics data for user: {}", userId);
    }
}

Data retention policies automatically delete old data:

-- BigQuery automatic deletion after 2 years
ALTER TABLE analytics.events
SET OPTIONS (
  partition_expiration_days = 730  -- 2 years
);

Differential Privacy

Differential privacy adds mathematical noise to aggregate data to prevent identifying individual users:

# Add differential privacy noise to aggregate metrics
import numpy as np

def add_laplace_noise(true_value, sensitivity, epsilon):
    """
    Add Laplace noise for differential privacy

    Args:
        true_value: Actual metric value
        sensitivity: Maximum change one individual can cause
        epsilon: Privacy parameter (smaller = more privacy, more noise)

    Returns:
        Noised value
    """
    scale = sensitivity / epsilon
    noise = np.random.laplace(0, scale)
    return true_value + noise

# Example: Report conversion rate with privacy
true_conversions = 1523
total_users = 10000

# Add noise to maintain privacy
noised_conversions = add_laplace_noise(
    true_value=true_conversions,
    sensitivity=1,  # One user can change count by at most 1
    epsilon=0.1     # Strong privacy guarantee
)

conversion_rate = noised_conversions / total_users
print(f"Differentially private conversion rate: {conversion_rate:.2%}")

This technique is used by large platforms (Apple, Google) to collect usage statistics while preserving individual privacy.

Real-Time vs. Batch Analytics

Real-Time Analytics

Real-time (streaming) analytics provide immediate insights as events occur:

// Real-time event processing with Kafka Streams
@Service
public class RealTimeAnalyticsProcessor {

    @Bean
    public KStream<String, AnalyticsEvent> processEvents(StreamsBuilder builder) {
        KStream<String, AnalyticsEvent> events = builder.stream("analytics-events");

        // Aggregate events in real-time windows
        events
            .groupBy((key, event) -> event.getEventName())
            .windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
            .count()
            .toStream()
            .foreach((windowedKey, count) -> {
                String eventName = windowedKey.key();
                long windowStart = windowedKey.window().start();

                // Push to real-time dashboard
                dashboardService.updateMetric(eventName, count, windowStart);

                // Alert on anomalies
                if (isAnomalous(eventName, count)) {
                    alertService.alert("Unusual activity: " + eventName + " count = " + count);
                }
            });

        return events;
    }

    private boolean isAnomalous(String eventName, long count) {
        // Compare to historical baseline
        long baseline = metricsService.getBaseline(eventName);
        return count > baseline * 2 || count < baseline * 0.5;
    }
}

Use cases for real-time analytics:

Operational dashboards - Monitor system health and user activity live
Anomaly detection - Alert on sudden traffic spikes or drops
Real-time personalization - Adjust content based on current user behavior
Fraud detection - Flag suspicious patterns immediately

Batch Analytics

Batch analytics process large volumes of historical data on a schedule:

// Scheduled batch job for complex analytics
@Service
public class BatchAnalyticsJob {

    @Scheduled(cron = "0 0 2 * * *")  // Run at 2 AM daily
    public void runDailyAnalytics() {
        log.info("Starting daily analytics batch job");

        // Complex cohort analysis requiring full historical data
        List<CohortMetrics> cohorts = calculateCohortRetention();

        // Generate business reports
        generateRevenueReport();
        generateUserGrowthReport();
        generateProductPerformanceReport();

        // Update materialized views in data warehouse
        refreshMaterializedViews();

        log.info("Completed daily analytics batch job");
    }

    private List<CohortMetrics> calculateCohortRetention() {
        // Query data warehouse for cohort analysis
        return bigQueryClient.query("""
            WITH user_cohorts AS (
              SELECT
                user_id,
                DATE_TRUNC(MIN(DATE(timestamp)), WEEK) as cohort_week
              FROM analytics.events
              GROUP BY user_id
            ),
            cohort_activity AS (
              SELECT
                c.cohort_week,
                DATE_TRUNC(DATE(e.timestamp), WEEK) as activity_week,
                COUNT(DISTINCT e.user_id) as active_users
              FROM user_cohorts c
              JOIN analytics.events e ON c.user_id = e.user_id
              GROUP BY c.cohort_week, activity_week
            )
            SELECT
              cohort_week,
              activity_week,
              active_users,
              active_users / FIRST_VALUE(active_users) OVER (
                PARTITION BY cohort_week ORDER BY activity_week
              ) as retention_rate
            FROM cohort_activity
            ORDER BY cohort_week, activity_week
        """);
    }
}

Use cases for batch analytics:

Complex historical analysis - Cohort retention, lifetime value calculations
Machine learning training - Train models on historical behavior data
Business reporting - Monthly revenue reports, quarterly growth metrics
Data warehouse maintenance - Rebuild aggregated tables, clean up old data

Lambda architecture combines both approaches:

Batch layer provides accuracy and completeness, speed layer provides low latency, and query service merges results from both.

Feature Flags - Using feature flags for A/B testing and gradual rollouts
Observability - Technical monitoring vs. product analytics
Logging - Structured logging for operational analytics
Performance Testing - Measuring application performance metrics
Security Testing - GDPR compliance testing

Overview​

Core Principles​

Analytics Implementation Strategies​

Choosing an Analytics Platform​

Event Tracking Strategy​

Implementing Event Tracking​

Server-Side Event Tracking​

User Behavior Analysis​

Funnels​

User Journeys​

Retention Cohorts​

Session Recording and Heatmaps​

A/B Testing Implementation​

Feature Flag-Based Testing​

Variant Assignment Strategy​

Statistical Significance​

Multivariate Testing​

Data Warehouse Integration​

ETL Pipelines​

Data Modeling​

Data Quality Monitoring​

Privacy and Compliance​

GDPR and CCPA Compliance​

Data Anonymization​

Right to be Forgotten​

Differential Privacy​

Real-Time vs. Batch Analytics​

Real-Time Analytics​

Batch Analytics​

Related Guidelines​

Further Reading​