Feature Flags and Toggles

Feature flags (also called feature toggles) allow you to decouple code deployment from feature release, enabling teams to deploy code to production with features disabled, then activate them for specific users or gradually roll them out. This practice is fundamental to continuous delivery, progressive deployment strategies, and safe production changes.

The core principle is separating deployment risk from release risk. You can deploy new code to production safely with features hidden behind flags, then activate those features when you're ready - perhaps starting with internal users, then 5% of customers, then progressively rolling out to everyone. If problems arise, you can disable the feature instantly without redeploying code.

Types of Feature Flags

Feature flags serve different purposes and have different lifecycles. Understanding these categories helps you design appropriate implementation strategies and establish cleanup processes to avoid flag debt.

Release Flags

Release flags control access to new features that aren't ready for all users. They're temporary by nature - once the feature is fully released and stable, the flag should be removed. These flags enable trunk-based development where incomplete features can exist in production code without being visible to users.

Release flags typically progress through stages: initially disabled for everyone, then enabled for internal testing, followed by gradual rollout to production users, and finally removed once the feature is universally available. The lifecycle usually spans weeks to months.

@Service
public class PaymentService {
    private final FeatureFlagService featureFlags;
    private final NewPaymentProcessor newProcessor;
    private final LegacyPaymentProcessor legacyProcessor;

    /**
     * Process payment using new processor if flag is enabled for user.
     * Release flag enables gradual migration to new payment system.
     */
    public PaymentResult processPayment(String userId, PaymentRequest request) {
        if (featureFlags.isEnabled("new-payment-processor", userId)) {
            return newProcessor.process(request);
        }
        return legacyProcessor.process(request);
    }
}

This pattern allows you to deploy the new payment processor to production, test it with internal users, then gradually roll it out while monitoring for issues. If the new processor has problems, you can instantly revert all users to the legacy implementation without code changes.

Experiment Flags

Experiment flags (A/B test flags) compare different implementations to determine which performs better. Unlike release flags that eventually become universally enabled, experiment flags help you make data-driven decisions about which variant to keep. These flags typically involve metrics collection and statistical analysis.

Experiment flags require stable variant assignment - each user must consistently see the same variant for the duration of the experiment to avoid confounding results. They also need integration with analytics systems to track metrics and determine statistical significance.

interface ExperimentVariant {
  variant: 'control' | 'treatment';
  trackingId: string;
}

class CheckoutService {
  constructor(
    private experiments: ExperimentService,
    private analytics: AnalyticsService
  ) {}

  /**
   * Show different checkout flows to compare conversion rates.
   * Experiment flag assigns users to variants and tracks outcomes.
   */
  async renderCheckout(userId: string): Promise<CheckoutView> {
    const experiment = this.experiments.getVariant(
      'streamlined-checkout',
      userId
    );

    this.analytics.trackExperimentExposure({
      experimentName: 'streamlined-checkout',
      variant: experiment.variant,
      userId,
      trackingId: experiment.trackingId
    });

    if (experiment.variant === 'treatment') {
      return this.renderStreamlinedCheckout();
    }
    return this.renderStandardCheckout();
  }

  async completeCheckout(userId: string, order: Order): Promise<void> {
    const experiment = this.experiments.getVariant(
      'streamlined-checkout',
      userId
    );

    // Track conversion for experiment analysis
    this.analytics.trackExperimentConversion({
      experimentName: 'streamlined-checkout',
      variant: experiment.variant,
      trackingId: experiment.trackingId,
      revenue: order.total
    });

    await this.orderService.complete(order);
  }
}

The experiment lifecycle involves defining success metrics upfront, running the experiment until achieving statistical significance, analyzing results, choosing the winning variant, and removing the flag while keeping only the winning implementation.

Operational Flags

Operational (ops) flags control system behavior to manage operational concerns like performance, load, or infrastructure limitations. These are often long-lived or permanent flags that system operators use to respond to runtime conditions.

Common operational flag patterns include circuit breakers that disable features under high load, graceful degradation flags that simplify functionality when upstream services fail, and resource management flags that control expensive operations.

@Service
public class ReportGenerationService {
    private final FeatureFlagService featureFlags;
    private final MetricsService metrics;

    /**
     * Disable expensive report generation under high system load.
     * Ops flag prevents resource exhaustion during traffic spikes.
     */
    public ReportResult generateDetailedReport(String accountId) {
        // Check current system load and flag state
        if (metrics.getCpuUsage() > 0.8 ||
            !featureFlags.isEnabled("detailed-reports")) {

            return ReportResult.fallback(
                "Detailed reports temporarily unavailable. " +
                "Please try again later or use simplified reports."
            );
        }

        return generateFullReport(accountId);
    }
}

Operational flags often integrate with monitoring systems, allowing automated responses to metrics (like automatically disabling features when error rates spike) or manual control during incidents.

Permission Flags

Permission flags control feature access based on user roles, subscriptions, or account types. These are typically long-lived flags tied to your business model or authorization system. Permission flags enforce business rules about who can use which features.

class AccountService {
  constructor(private permissions: PermissionService) {}

  /**
   * Check if user's subscription tier includes premium features.
   * Permission flag enforces business model constraints.
   */
  async canAccessPremiumAnalytics(userId: string): Promise<boolean> {
    const user = await this.users.findById(userId);

    return this.permissions.hasFeature(
      userId,
      'premium-analytics',
      {
        accountType: user.accountType,
        subscriptionTier: user.subscription.tier
      }
    );
  }

  async getAnalyticsData(userId: string): Promise<AnalyticsData> {
    if (await this.canAccessPremiumAnalytics(userId)) {
      return this.fetchPremiumAnalytics(userId);
    }
    return this.fetchBasicAnalytics(userId);
  }
}

Permission flags blur the line between feature flagging and authorization. In practice, they're often implemented using the same infrastructure but evaluated against different contexts (user attributes vs. experiment assignments).

Feature Flag Frameworks and Implementation

Feature flag systems range from simple environment variables to sophisticated SaaS platforms with targeting rules, gradual rollouts, and analytics integration. The choice depends on your requirements for targeting flexibility, performance, team size, and budget.

Environment Variables (Simplest Approach)

Environment variables provide the simplest feature flag mechanism: boolean values configured per environment. This approach works for basic on/off flags that don't need per-user targeting or gradual rollouts. Configuration changes require deployment or container restart.

@Configuration
public class FeatureFlags {
    @Value("${features.new-payment-api:false}")
    private boolean newPaymentApiEnabled;

    @Value("${features.enhanced-fraud-detection:false}")
    private boolean enhancedFraudDetectionEnabled;

    /**
     * Check if new payment API is enabled in this environment.
     * Simple boolean flag configured via application.properties.
     */
    public boolean isNewPaymentApiEnabled() {
        return newPaymentApiEnabled;
    }

    public boolean isEnhancedFraudDetectionEnabled() {
        return enhancedFraudDetectionEnabled;
    }
}

This approach has significant limitations: no per-user targeting, no gradual rollouts, changes require redeployment, and no dynamic control. It's appropriate for simple release flags in low-traffic applications or flags that only need environment-level control (e.g., enabling debug features in development).

Database-Backed Toggle System

Database-backed flags enable runtime changes without deployment. You store flag configurations in a database table and provide an admin interface for toggling features. This pattern supports per-user targeting through database queries.

// Entity representing a feature flag
@Entity
@Table(name = "feature_flags")
public class FeatureFlag {
    @Id
    private String name;

    private boolean enabled;

    @Column(name = "enabled_for_users")
    @ElementCollection
    private Set<String> enabledForUsers = new HashSet<>();

    @Column(name = "enabled_percentage")
    private int enabledPercentage; // 0-100

    @Column(name = "enabled_from")
    private LocalDateTime enabledFrom;

    @Column(name = "enabled_until")
    private LocalDateTime enabledUntil;
}

@Service
public class DatabaseFeatureFlagService implements FeatureFlagService {
    private final FeatureFlagRepository repository;
    private final LoadingCache<String, FeatureFlag> cache;

    public DatabaseFeatureFlagService(FeatureFlagRepository repository) {
        this.repository = repository;
        // Cache flags for 60 seconds to reduce database load
        this.cache = Caffeine.newBuilder()
            .expireAfterWrite(60, TimeUnit.SECONDS)
            .build(flagName -> repository.findById(flagName).orElse(null));
    }

    /**
     * Evaluate feature flag with percentage-based rollout and user targeting.
     * Checks explicit user list, percentage rollout, and time-based activation.
     */
    @Override
    public boolean isEnabled(String flagName, String userId) {
        FeatureFlag flag = cache.get(flagName);

        if (flag == null || !flag.isEnabled()) {
            return false;
        }

        // Check time-based activation window
        LocalDateTime now = LocalDateTime.now();
        if (flag.getEnabledFrom() != null && now.isBefore(flag.getEnabledFrom())) {
            return false;
        }
        if (flag.getEnabledUntil() != null && now.isAfter(flag.getEnabledUntil())) {
            return false;
        }

        // Check explicit user list
        if (flag.getEnabledForUsers().contains(userId)) {
            return true;
        }

        // Percentage-based rollout using consistent hashing
        if (flag.getEnabledPercentage() > 0) {
            return isInPercentageRollout(flagName, userId, flag.getEnabledPercentage());
        }

        return false;
    }

    /**
     * Consistent hash-based percentage rollout.
     * Same user always gets same result for stable rollout.
     */
    private boolean isInPercentageRollout(String flagName, String userId, int percentage) {
        String hashInput = flagName + ":" + userId;
        int hash = Math.abs(hashInput.hashCode());
        return (hash % 100) < percentage;
    }
}

Database-backed systems need caching to avoid performance problems (evaluating flags on every request would create excessive database load). The cache TTL creates eventual consistency - flag changes take up to 60 seconds to propagate. For immediate updates, implement cache invalidation when flags change.

This approach works well for small-to-medium applications where you need basic targeting and gradual rollouts but don't want to pay for a SaaS platform. Limitations include limited targeting rules (compared to specialized platforms), no built-in analytics, and operational overhead for the admin UI.

Remote Config Services

Remote configuration services like LaunchDarkly, Unleash, Split.io, or AWS AppConfig provide sophisticated feature flag platforms with advanced targeting, gradual rollouts, analytics integration, and real-time updates. These platforms separate flag evaluation from your database and provide SDKs that cache flag states locally.

import LaunchDarkly from 'launchdarkly-node-server-sdk';

class FeatureFlagService {
  private ldClient: LaunchDarkly.LDClient;

  async initialize() {
    this.ldClient = LaunchDarkly.init(process.env.LAUNCHDARKLY_SDK_KEY!);
    await this.ldClient.waitForInitialization();
  }

  /**
   * Evaluate feature flag with sophisticated targeting rules.
   * LaunchDarkly SDK handles caching and real-time updates.
   */
  async isEnabled(
    flagKey: string,
    userId: string,
    context?: Record<string, any>
  ): Promise<boolean> {
    const user: LaunchDarkly.LDUser = {
      key: userId,
      custom: {
        ...context,
        timestamp: Date.now()
      }
    };

    return await this.ldClient.variation(flagKey, user, false);
  }

  /**
   * Get typed flag value with default fallback.
   * Supports string, number, JSON flags beyond booleans.
   */
  async getVariation<T>(
    flagKey: string,
    userId: string,
    defaultValue: T,
    context?: Record<string, any>
  ): Promise<T> {
    const user: LaunchDarkly.LDUser = {
      key: userId,
      custom: context
    };

    return await this.ldClient.variation(flagKey, user, defaultValue);
  }

  /**
   * Track custom events for experiment analysis.
   * Enables cohort analysis and experiment metrics.
   */
  async trackEvent(
    eventName: string,
    userId: string,
    data?: Record<string, any>
  ): Promise<void> {
    await this.ldClient.track(eventName, { key: userId }, data);
  }
}

Remote config platforms provide rich targeting capabilities: target users by attributes (email domain, account type, location), create user segments, define percentage rollouts, schedule flag changes, and integrate with analytics platforms. They also provide SDKs that maintain local caches and use streaming connections for near-instant flag updates.

The primary tradeoffs are cost (these platforms charge per user or flag evaluation) and external dependency (if the service is down, your app needs cached values to continue functioning). Most platforms provide resilient SDKs that cache flag states and continue serving stale values during outages.

Feature Flag Adapter Pattern

In practice, you may evolve from simple flags to sophisticated platforms, or use different systems in different environments. The adapter pattern provides an abstraction layer that makes the implementation swappable.

// Common interface for all feature flag implementations
public interface FeatureFlagService {
    boolean isEnabled(String flagName, String userId);
    boolean isEnabled(String flagName, String userId, Map<String, Object> context);
    <T> T getVariation(String flagName, String userId, T defaultValue);
}

// Simple implementation for local development
public class LocalFeatureFlagService implements FeatureFlagService {
    private final Map<String, Boolean> flags = new HashMap<>();

    public LocalFeatureFlagService() {
        // All flags default to enabled in local dev
        flags.put("new-payment-processor", true);
        flags.put("enhanced-fraud-detection", true);
    }

    @Override
    public boolean isEnabled(String flagName, String userId) {
        return flags.getOrDefault(flagName, false);
    }

    @Override
    public boolean isEnabled(String flagName, String userId,
                            Map<String, Object> context) {
        return isEnabled(flagName, userId);
    }

    @Override
    public <T> T getVariation(String flagName, String userId, T defaultValue) {
        if (isEnabled(flagName, userId)) {
            return defaultValue; // Simple implementation always returns default
        }
        return defaultValue;
    }
}

// Production implementation using LaunchDarkly
public class LaunchDarklyFeatureFlagService implements FeatureFlagService {
    private final LDClient ldClient;

    public LaunchDarklyFeatureFlagService(String sdkKey) {
        this.ldClient = new LDClient(sdkKey);
    }

    @Override
    public boolean isEnabled(String flagName, String userId) {
        LDUser user = new LDUser.Builder(userId).build();
        return ldClient.boolVariation(flagName, user, false);
    }

    @Override
    public boolean isEnabled(String flagName, String userId,
                            Map<String, Object> context) {
        LDUser.Builder builder = new LDUser.Builder(userId);
        context.forEach((key, value) -> builder.custom(key, String.valueOf(value)));
        return ldClient.boolVariation(flagName, builder.build(), false);
    }

    @Override
    public <T> T getVariation(String flagName, String userId, T defaultValue) {
        LDUser user = new LDUser.Builder(userId).build();
        return ldClient.jsonValueVariation(flagName, user,
            LDValue.of(defaultValue)).jsonValue();
    }
}

// Configuration selects implementation based on environment
@Configuration
public class FeatureFlagConfiguration {
    @Bean
    public FeatureFlagService featureFlagService(
        @Value("${feature-flags.provider}") String provider,
        @Value("${feature-flags.sdk-key:}") String sdkKey
    ) {
        return switch (provider) {
            case "launchdarkly" -> new LaunchDarklyFeatureFlagService(sdkKey);
            case "database" -> new DatabaseFeatureFlagService();
            case "local" -> new LocalFeatureFlagService();
            default -> throw new IllegalArgumentException("Unknown provider: " + provider);
        };
    }
}

This pattern allows you to use simple local flags during development, database flags in staging, and LaunchDarkly in production without changing application code. The abstraction also makes it easier to migrate between platforms or A/B test different flag providers.

Gradual Rollout Strategies

Gradual rollouts reduce risk by exposing changes to increasingly larger user populations while monitoring for issues. If problems occur, you can halt the rollout or roll back without affecting all users. This practice enables continuous delivery with controlled release risk.

Percentage-Based Rollout

Percentage-based rollouts enable features for a specific percentage of users, gradually increasing from 1% to 100%. This approach spreads risk and provides early warning of issues before they affect all users.

The rollout progression typically follows: 1% (canary), 5% (early validation), 25% (broader exposure), 50% (majority), 100% (full release). Each stage runs for hours or days depending on traffic volume and confidence. Monitor key metrics (error rates, response times, conversion rates) at each stage before proceeding.

Percentage rollouts require consistent user bucketing - each user must consistently be in the "enabled" or "disabled" group throughout the rollout. Implement this using consistent hashing based on user ID rather than random assignment.

class GradualRolloutService(
    private val featureFlags: FeatureFlagService,
    private val metrics: MetricsService
) {
    /**
     * Gradually increase feature exposure while monitoring error rates.
     * Automatically rolls back if error rate exceeds threshold.
     */
    suspend fun evaluateWithSafeguards(
        flagName: String,
        userId: String
    ): Boolean {
        // Check if flag is enabled for this user
        if (!featureFlags.isEnabled(flagName, userId)) {
            return false
        }

        // Monitor error rates for users with flag enabled
        val errorRate = metrics.getErrorRate(
            feature = flagName,
            timeWindow = Duration.ofMinutes(5)
        )

        // Automatic rollback if error rate too high
        if (errorRate > 0.05) { // 5% error threshold
            metrics.incrementCounter("feature_flag.automatic_rollback",
                mapOf("flag" to flagName, "error_rate" to errorRate.toString()))

            featureFlags.disable(flagName) // Emergency disable
            return false
        }

        return true
    }
}

Automated rollback based on metrics (error rates, latency, conversion) provides safety during gradual rollouts. When error rates spike above acceptable thresholds, automatically disable the flag and alert the team. This requires integration between your feature flag system and observability platform (see Monitoring and Alerting).

User-Based Targeting

User-based targeting enables features for specific individuals, teams, or organizations before broader rollout. This pattern is essential for beta testing, internal validation, and progressive customer onboarding.

interface TargetingContext {
  userId: string;
  email?: string;
  accountType?: string;
  organizationId?: string;
  betaTester?: boolean;
  internalUser?: boolean;
  attributes?: Record<string, any>;
}

class TargetedFeatureService {
  constructor(private featureFlags: FeatureFlagService) {}

  /**
   * Evaluate feature with rich targeting context.
   * Supports targeting by user attributes, organization, beta status.
   */
  async isEnabledForUser(
    flagName: string,
    context: TargetingContext
  ): Promise<boolean> {
    // Internal users always get new features first
    if (context.internalUser) {
      return this.featureFlags.isEnabled(flagName, context.userId, {
        segment: 'internal',
        ...context.attributes
      });
    }

    // Beta testers opt into experimental features
    if (context.betaTester) {
      return this.featureFlags.isEnabled(flagName, context.userId, {
        segment: 'beta',
        ...context.attributes
      });
    }

    // Enterprise accounts may have different feature access
    if (context.accountType === 'enterprise') {
      return this.featureFlags.isEnabled(flagName, context.userId, {
        segment: 'enterprise',
        organizationId: context.organizationId,
        ...context.attributes
      });
    }

    // Default evaluation for standard users
    return this.featureFlags.isEnabled(flagName, context.userId, context.attributes);
  }
}

Typical targeting progression: internal users → beta testers → specific customers → percentage rollout → all users. This approach provides multiple validation gates before broad release.

Ring Deployment

Ring deployment organizes users into concentric rings, deploying changes from inner rings (lowest risk) to outer rings (production users). Each ring represents a different risk tolerance and user population.

Each ring has criteria for promotion to the next ring: passing time threshold (e.g., 24 hours), meeting quality metrics (error rate < 0.1%), and manual approval. Ring deployment provides systematic risk reduction with clear decision points.

Ring definitions typically align with organizational structure and user segments. Ring 0 (development) catches obvious issues immediately. Ring 1 (internal users) validates in production-like conditions with friendly users who can provide feedback. Ring 2 (early adopters) includes beta testers who opted into experimental features. Ring 3 (canary) is a small production percentage for real-world validation. Ring 4 (production) is the full user base.

Implementation requires mapping users to rings and configuring feature flags accordingly:

@Service
public class RingDeploymentService {
    private final FeatureFlagService featureFlags;
    private final UserSegmentService userSegments;

    /**
     * Determine user's deployment ring for feature access.
     * Ring assignment based on user attributes and feature configuration.
     */
    public Ring getUserRing(String userId, String featureName) {
        if (userSegments.isDeveloper(userId)) {
            return Ring.RING_0_DEVELOPMENT;
        }
        if (userSegments.isInternalUser(userId)) {
            return Ring.RING_1_INTERNAL;
        }
        if (userSegments.isBetaTester(userId)) {
            return Ring.RING_2_EARLY_ADOPTERS;
        }
        if (featureFlags.isInCanaryGroup(featureName, userId)) {
            return Ring.RING_3_CANARY;
        }
        return Ring.RING_4_PRODUCTION;
    }

    /**
     * Check if feature has reached user's ring.
     * Features propagate from inner rings outward.
     */
    public boolean isFeatureAvailable(String featureName, String userId) {
        Ring userRing = getUserRing(userId, featureName);
        Ring featureRing = featureFlags.getCurrentRing(featureName);

        // Feature available if it has reached or passed user's ring
        return featureRing.ordinal() >= userRing.ordinal();
    }

    public enum Ring {
        RING_0_DEVELOPMENT(0),
        RING_1_INTERNAL(1),
        RING_2_EARLY_ADOPTERS(2),
        RING_3_CANARY(3),
        RING_4_PRODUCTION(4);

        private final int level;

        Ring(int level) {
            this.level = level;
        }

        public int ordinal() {
            return level;
        }
    }
}

Ring deployment is particularly valuable for high-stakes changes where gradual validation is critical. It combines the benefits of internal testing, beta programs, and canary deployments in a structured framework.

Canary Releases

Canary releases deploy changes to a small subset of production infrastructure (servers, pods, regions) while most infrastructure runs the stable version. Traffic is split between canary and stable versions, with monitoring to detect issues before full rollout.

While similar to percentage-based rollouts, canary releases operate at the infrastructure level rather than user level. A percentage rollout enables a feature for 5% of users across all servers; a canary release runs the new code on 5% of servers with all users on those servers seeing the change.

# Kubernetes canary deployment with traffic splitting
apiVersion: v1
kind: Service
metadata:
  name: payment-service
spec:
  selector:
    app: payment-service
  ports:
    - port: 8080
---
# Stable deployment (95% of traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service-stable
spec:
  replicas: 19  # 95% of total capacity
  selector:
    matchLabels:
      app: payment-service
      version: stable
  template:
    metadata:
      labels:
        app: payment-service
        version: stable
    spec:
      containers:
      - name: payment-service
        image: payment-service:v1.2.0
---
# Canary deployment (5% of traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service-canary
spec:
  replicas: 1  # 5% of total capacity
  selector:
    matchLabels:
      app: payment-service
      version: canary
  template:
    metadata:
      labels:
        app: payment-service
        version: canary
    spec:
      containers:
      - name: payment-service
        image: payment-service:v1.3.0  # New version

Canary releases work well with infrastructure-level metrics (CPU, memory, request latency) and can be automated using service meshes like Istio or deployment tools like Flagger (see Kubernetes Best Practices for detailed implementation patterns).

Combine canary releases with feature flags for maximum safety: canary deployment validates infrastructure concerns (performance, resource usage) while feature flags control feature visibility. Even if the canary deployment reaches 100%, features can remain disabled until explicitly activated.

A/B Testing Integration

A/B testing uses feature flags to show different variants to different user segments, measuring which variant performs better against defined success metrics. This data-driven approach validates hypotheses about user behavior and feature effectiveness before committing to changes.

Experiment Design

Well-designed experiments require clear hypotheses, defined success metrics, minimum sample sizes for statistical significance, and variant implementations that test a single variable. Poor experiment design leads to inconclusive results or false conclusions.

interface ExperimentConfig {
  name: string;
  hypothesis: string;
  successMetrics: SuccessMetric[];
  variants: ExperimentVariant[];
  minimumSampleSize: number;
  minimumDurationDays: number;
  targetAudience?: TargetingRules;
}

interface SuccessMetric {
  name: string;
  type: 'conversion' | 'revenue' | 'engagement' | 'retention';
  minimumDetectableEffect: number; // Smallest meaningful change
}

interface ExperimentVariant {
  name: string;
  description: string;
  allocationPercentage: number;
  implementation: string; // Code path or config
}

/**
 * Example experiment configuration for checkout flow optimization.
 * Hypothesis: Reducing checkout steps increases conversion rate.
 */
const streamlinedCheckoutExperiment: ExperimentConfig = {
  name: 'streamlined-checkout-v2',
  hypothesis: 'Reducing checkout from 4 steps to 2 steps will increase conversion rate by at least 5%',
  successMetrics: [
    {
      name: 'checkout_conversion_rate',
      type: 'conversion',
      minimumDetectableEffect: 0.05 // 5% improvement
    },
    {
      name: 'average_order_value',
      type: 'revenue',
      minimumDetectableEffect: 0.02 // 2% improvement
    }
  ],
  variants: [
    {
      name: 'control',
      description: 'Current 4-step checkout flow',
      allocationPercentage: 50,
      implementation: 'legacy-checkout'
    },
    {
      name: 'treatment',
      description: 'New 2-step streamlined checkout',
      allocationPercentage: 50,
      implementation: 'streamlined-checkout'
    }
  ],
  minimumSampleSize: 1000, // Per variant
  minimumDurationDays: 14,
  targetAudience: {
    // Run experiment only for new users to avoid confounding
    userAttribute: 'account_age_days',
    operator: 'less_than',
    value: 7
  }
};

The hypothesis must be falsifiable and include the expected magnitude of change. Success metrics should be defined before running the experiment to avoid cherry-picking metrics after seeing results. Minimum detectable effect (MDE) determines sample size requirements - smaller effects require larger samples to detect with statistical confidence.

Variant Assignment and Consistency

Users must see the same variant consistently throughout the experiment to avoid confounding results. Inconsistent assignment (user sees variant A on Monday, variant B on Tuesday) invalidates the experiment because you can't attribute behavioral changes to the variant itself.

@Service
public class ExperimentService {
    private final FeatureFlagService featureFlags;
    private final AnalyticsService analytics;
    private final ExperimentRepository experiments;

    /**
     * Get user's assigned variant with consistent hashing.
     * Same user always gets same variant for experiment duration.
     */
    public ExperimentVariant getVariant(String experimentName, String userId) {
        Experiment experiment = experiments.findByName(experimentName)
            .orElseThrow(() -> new ExperimentNotFoundException(experimentName));

        // Check if user is in experiment's target audience
        if (!isInTargetAudience(experiment, userId)) {
            return ExperimentVariant.control();
        }

        // Consistent hash-based assignment
        String assignmentKey = experimentName + ":" + userId + ":" + experiment.getSalt();
        int hash = Math.abs(assignmentKey.hashCode());
        int bucket = hash % 100;

        // Find variant based on allocation percentages
        int cumulativePercentage = 0;
        for (Variant variant : experiment.getVariants()) {
            cumulativePercentage += variant.getAllocationPercentage();
            if (bucket < cumulativePercentage) {
                return new ExperimentVariant(
                    variant.getName(),
                    experiment.getTrackingId(),
                    experiment.getName()
                );
            }
        }

        return ExperimentVariant.control(); // Fallback
    }

    /**
     * Track user exposure to experiment variant.
     * Called when user sees the variant, not when assigned.
     */
    public void trackExposure(String experimentName, String userId, String variant) {
        analytics.track(EventType.EXPERIMENT_EXPOSURE, Map.of(
            "experiment_name", experimentName,
            "variant", variant,
            "user_id", userId,
            "timestamp", Instant.now()
        ));
    }

    /**
     * Track success metric for experiment analysis.
     * Links user action to their experiment variant.
     */
    public void trackConversion(
        String experimentName,
        String userId,
        String metricName,
        double value
    ) {
        ExperimentVariant variant = getVariant(experimentName, userId);

        analytics.track(EventType.EXPERIMENT_CONVERSION, Map.of(
            "experiment_name", experimentName,
            "variant", variant.getName(),
            "metric_name", metricName,
            "metric_value", value,
            "user_id", userId,
            "timestamp", Instant.now()
        ));
    }
}

The experiment salt ensures different experiments assign users independently - a user in treatment for experiment A might be in control for experiment B. This prevents correlation between experiments.

Metrics Collection and Analysis

Track both exposure (user saw the variant) and conversion (user completed the target action) events. Exposure tracking enables accurate conversion rate calculation and ensures you only analyze users who actually experienced the variant.

class ExperimentAnalyticsService {
  constructor(
    private experiments: ExperimentService,
    private analytics: AnalyticsService
  ) {}

  /**
   * Track when user is exposed to experiment variant.
   * Exposure ≠ assignment; only track when variant is shown.
   */
  async trackExposure(
    experimentName: string,
    userId: string,
    pageContext: Record<string, any>
  ): Promise<void> {
    const variant = await this.experiments.getVariant(experimentName, userId);

    await this.analytics.track({
      eventType: 'experiment_exposure',
      userId,
      properties: {
        experiment_name: experimentName,
        variant: variant.variant,
        tracking_id: variant.trackingId,
        page_url: pageContext.url,
        timestamp: new Date().toISOString()
      }
    });
  }

  /**
   * Track conversion event tied to experiment variant.
   * Enables cohort analysis to compare variant performance.
   */
  async trackConversion(
    experimentName: string,
    userId: string,
    metricName: string,
    value?: number
  ): Promise<void> {
    const variant = await this.experiments.getVariant(experimentName, userId);

    await this.analytics.track({
      eventType: 'experiment_conversion',
      userId,
      properties: {
        experiment_name: experimentName,
        variant: variant.variant,
        tracking_id: variant.trackingId,
        metric_name: metricName,
        metric_value: value ?? 1,
        timestamp: new Date().toISOString()
      }
    });
  }

  /**
   * Get experiment results with statistical significance.
   * Calculates conversion rates and confidence intervals per variant.
   */
  async getExperimentResults(
    experimentName: string
  ): Promise<ExperimentResults> {
    const exposures = await this.analytics.getEvents({
      eventType: 'experiment_exposure',
      filters: { experiment_name: experimentName }
    });

    const conversions = await this.analytics.getEvents({
      eventType: 'experiment_conversion',
      filters: { experiment_name: experimentName }
    });

    // Group by variant
    const variantResults = this.calculateVariantMetrics(exposures, conversions);

    // Calculate statistical significance
    const significance = this.calculateSignificance(
      variantResults['control'],
      variantResults['treatment']
    );

    return {
      experimentName,
      variants: variantResults,
      significance,
      recommendedAction: this.getRecommendation(significance, variantResults)
    };
  }

  /**
   * Calculate statistical significance using chi-square test.
   * Determines if observed difference is statistically meaningful.
   */
  private calculateSignificance(
    control: VariantMetrics,
    treatment: VariantMetrics
  ): StatisticalSignificance {
    // Chi-square test for conversion rate difference
    const pooledRate =
      (control.conversions + treatment.conversions) /
      (control.exposures + treatment.exposures);

    const expectedControl = control.exposures * pooledRate;
    const expectedTreatment = treatment.exposures * pooledRate;

    const chiSquare =
      Math.pow(control.conversions - expectedControl, 2) / expectedControl +
      Math.pow(treatment.conversions - expectedTreatment, 2) / expectedTreatment;

    const pValue = this.chiSquareToPValue(chiSquare, 1); // 1 degree of freedom

    return {
      pValue,
      isSignificant: pValue < 0.05,
      confidenceLevel: 1 - pValue,
      relativeImprovement: (treatment.conversionRate - control.conversionRate) / control.conversionRate
    };
  }
}

Analysis should wait until you have sufficient sample size and time duration. Peeking at results too early (before reaching minimum sample size) increases false positive rates. Many experiments show early trends that don't hold up over time due to novelty effects or sample variance.

Common pitfalls include testing multiple metrics without adjusting significance thresholds (increasing false positive rate), stopping experiments early when winning variant emerges (regression to mean), and not accounting for external factors (seasonal effects, marketing campaigns) that affect all variants.

Technical Debt: Flag Cleanup Strategies

Feature flags create technical debt if not removed after serving their purpose. Old flags accumulate in codebases, creating complexity, slowing down new development, and increasing the risk of bugs when forgotten flags interact with new code. Systematic cleanup processes prevent flag debt from becoming unmanageable.

Flag Lifecycle Management

Every flag should have a defined lifecycle with clear criteria for retirement. Document the flag's purpose, expected lifetime, and removal conditions when creating it. This metadata guides cleanup decisions.

@Entity
@Table(name = "feature_flags")
public class FeatureFlag {
    @Id
    private String name;

    private boolean enabled;

    @Enumerated(EnumType.STRING)
    private FlagType type;

    @Column(name = "created_date")
    private LocalDateTime createdDate;

    @Column(name = "expected_removal_date")
    private LocalDateTime expectedRemovalDate;

    @Column(name = "purpose")
    private String purpose;

    @Column(name = "jira_ticket")
    private String jiraTicket; // Link to cleanup task

    @Column(name = "last_evaluated")
    private LocalDateTime lastEvaluated;

    @Column(name = "evaluation_count")
    private long evaluationCount;

    /**
     * Determine if flag is eligible for removal.
     * Release flags ready for cleanup after stable period.
     */
    public boolean isEligibleForRemoval() {
        // Release flags should be removed after full rollout
        if (type == FlagType.RELEASE) {
            return enabled &&
                   Duration.between(createdDate, LocalDateTime.now()).toDays() > 30;
        }

        // Experiment flags removed after conclusion
        if (type == FlagType.EXPERIMENT) {
            return expectedRemovalDate != null &&
                   LocalDateTime.now().isAfter(expectedRemovalDate);
        }

        // Ops and permission flags are long-lived
        return false;
    }

    /**
     * Check if flag appears to be abandoned.
     * No evaluations in 30 days suggests flag no longer used.
     */
    public boolean isAbandoned() {
        return lastEvaluated != null &&
               Duration.between(lastEvaluated, LocalDateTime.now()).toDays() > 30;
    }

    public enum FlagType {
        RELEASE,    // Temporary: remove after full rollout
        EXPERIMENT, // Temporary: remove after conclusion
        OPS,        // Long-lived: operational control
        PERMISSION  // Long-lived: business logic
    }
}

Automated cleanup workflows can identify flags ready for removal and create tasks. Schedule monthly reviews of flag inventory to identify candidates for cleanup.

Monitoring Flag Usage

Track flag evaluation frequency to identify flags that are no longer being checked. A flag not evaluated in 30 days is likely dead code that can be safely removed.

class FeatureFlagMiddleware {
  constructor(
    private featureFlags: FeatureFlagService,
    private metrics: MetricsService,
    private flagAudit: FlagAuditService
  ) {}

  /**
   * Intercept flag evaluations to track usage.
   * Enables identification of unused or stale flags.
   */
  async isEnabled(
    flagName: string,
    userId: string,
    context?: Record<string, any>
  ): Promise<boolean> {
    const startTime = Date.now();

    try {
      const enabled = await this.featureFlags.isEnabled(flagName, userId, context);

      // Track evaluation metrics
      this.metrics.incrementCounter('feature_flag.evaluation', {
        flag_name: flagName,
        enabled: enabled.toString(),
        user_id: userId
      });

      // Record evaluation for lifecycle tracking
      await this.flagAudit.recordEvaluation(flagName, userId, enabled);

      // Track evaluation latency
      const duration = Date.now() - startTime;
      this.metrics.recordHistogram('feature_flag.evaluation_duration_ms', duration, {
        flag_name: flagName
      });

      return enabled;
    } catch (error) {
      // Track flag evaluation errors
      this.metrics.incrementCounter('feature_flag.evaluation_error', {
        flag_name: flagName,
        error_type: error.constructor.name
      });

      // Fail open or closed based on flag type
      return this.getDefaultValue(flagName);
    }
  }

  /**
   * Generate flag usage report for cleanup candidates.
   * Identifies flags with zero evaluations in specified period.
   */
  async getUnusedFlags(daysSinceLastEvaluation: number): Promise<FlagReport[]> {
    const cutoff = new Date();
    cutoff.setDate(cutoff.getDate() - daysSinceLastEvaluation);

    return await this.flagAudit.findFlagsNotEvaluatedSince(cutoff);
  }
}

Dashboard visualization of flag age and usage helps teams prioritize cleanup. Show flags by type, age, last evaluation, and evaluation frequency to identify cleanup candidates.

Automated Flag Removal Process

The safest flag removal process involves multiple stages: toggle flag permanently on, remove conditional logic, remove flag configuration. Each stage should be deployed separately with monitoring.

/**
 * Stage 1: Flag is at 100%, code paths coexist
 * Both old and new code exist, flag controls which runs
 */
public PaymentResult processPayment(PaymentRequest request) {
    if (featureFlags.isEnabled("new-payment-processor", request.getUserId())) {
        return newPaymentProcessor.process(request);
    }
    return legacyPaymentProcessor.process(request);
}

/**
 * Stage 2: Remove flag check, always use new code path
 * Deploy this change and monitor. Old code still exists but unreachable.
 */
public PaymentResult processPayment(PaymentRequest request) {
    return newPaymentProcessor.process(request);
}

/**
 * Stage 3: Remove old code path entirely
 * After stable period with stage 2, remove legacy implementation
 */
public PaymentResult processPayment(PaymentRequest request) {
    return newPaymentProcessor.process(request);
}
// legacyPaymentProcessor deleted

/**
 * Stage 4: Remove flag configuration
 * Delete feature flag from database/config after code cleanup
 */

This staged approach allows rollback at each step. If stage 2 causes issues, you can reintroduce the flag check and disable the flag. If stage 3 causes issues, you can restore the legacy code path.

Static analysis tools can identify flag removal candidates by finding flags that are always true or always false:

/**
 * ESLint custom rule to detect flags always enabled.
 * Suggests removal when flag check is redundant.
 */
const flagAlwaysEnabledRule = {
  create(context: eslint.RuleContext) {
    return {
      IfStatement(node: any) {
        // Detect: if (featureFlags.isEnabled('flag-name'))
        if (
          node.test.type === 'CallExpression' &&
          node.test.callee.property?.name === 'isEnabled' &&
          node.alternate === null // No else branch
        ) {
          const flagName = node.test.arguments[0]?.value;

          // Check if flag is permanently enabled
          if (isKnownPermanentlyEnabled(flagName)) {
            context.report({
              node,
              message: `Feature flag '${flagName}' is permanently enabled. Consider removing the flag check.`,
              suggest: [{
                desc: 'Remove flag check and always execute code',
                fix: (fixer) => fixer.remove(node)
              }]
            });
          }
        }
      }
    };
  }
};

Flag Debt Tracking

Maintain a flag inventory with metadata about each flag's purpose, creation date, and expected removal timeline. Review this inventory regularly to prevent flag accumulation.

-- Query to find flags ready for cleanup
SELECT
    name,
    type,
    created_date,
    expected_removal_date,
    last_evaluated,
    DATEDIFF(NOW(), last_evaluated) as days_since_evaluation,
    purpose,
    jira_ticket
FROM feature_flags
WHERE
    -- Release flags older than 90 days
    (type = 'RELEASE' AND enabled = true AND DATEDIFF(NOW(), created_date) > 90)
    OR
    -- Experiment flags past removal date
    (type = 'EXPERIMENT' AND expected_removal_date < NOW())
    OR
    -- Any flag not evaluated in 60 days
    (last_evaluated IS NOT NULL AND DATEDIFF(NOW(), last_evaluated) > 60)
ORDER BY days_since_evaluation DESC;

Create automated tasks for flag cleanup in your issue tracker (Jira, Linear, GitHub Issues) when flags reach removal criteria. Assign these tasks to the team that created the flag.

Testing With Feature Flags

Feature flags increase test complexity because code now has multiple paths through conditional logic. Comprehensive testing requires validating behavior with flags enabled and disabled, as well as handling flag evaluation failures gracefully.

Testing All Flag States

Unit tests should cover both enabled and disabled states for each flag. This ensures the application functions correctly regardless of flag configuration and prevents regressions when removing flags.

@Test
public void processPayment_withNewProcessorEnabled_usesNewImplementation() {
    // Arrange
    when(featureFlags.isEnabled("new-payment-processor", "user123"))
        .thenReturn(true);
    when(newPaymentProcessor.process(any()))
        .thenReturn(PaymentResult.success("txn-456"));

    PaymentRequest request = new PaymentRequest("user123", Money.of(100, "USD"));

    // Act
    PaymentResult result = paymentService.processPayment(request);

    // Assert
    assertThat(result.isSuccess()).isTrue();
    verify(newPaymentProcessor).process(request);
    verify(legacyPaymentProcessor, never()).process(any());
}

@Test
public void processPayment_withNewProcessorDisabled_usesLegacyImplementation() {
    // Arrange
    when(featureFlags.isEnabled("new-payment-processor", "user123"))
        .thenReturn(false);
    when(legacyPaymentProcessor.process(any()))
        .thenReturn(PaymentResult.success("txn-789"));

    PaymentRequest request = new PaymentRequest("user123", Money.of(100, "USD"));

    // Act
    PaymentResult result = paymentService.processPayment(request);

    // Assert
    assertThat(result.isSuccess()).isTrue();
    verify(legacyPaymentProcessor).process(request);
    verify(newPaymentProcessor, never()).process(any());
}

@Test
public void processPayment_whenFlagServiceFails_usesDefaultBehavior() {
    // Arrange: Flag service throws exception
    when(featureFlags.isEnabled("new-payment-processor", "user123"))
        .thenThrow(new FeatureFlagServiceException("Connection timeout"));
    when(legacyPaymentProcessor.process(any()))
        .thenReturn(PaymentResult.success("txn-999"));

    PaymentRequest request = new PaymentRequest("user123", Money.of(100, "USD"));

    // Act
    PaymentResult result = paymentService.processPayment(request);

    // Assert: Falls back to legacy processor (fail-safe default)
    assertThat(result.isSuccess()).isTrue();
    verify(legacyPaymentProcessor).process(request);
}

Test flag service failures to ensure graceful degradation. Feature flag evaluation can fail due to network issues, timeouts, or service outages. Your application should define safe defaults (fail open to new code, fail closed to stable code) and continue functioning.

Integration Tests With Flag Contexts

Integration tests should validate critical paths with flags in expected production states. Use test data management strategies to configure flag states for different test scenarios.

@SpringBootTest
@TestContainersConfiguration
class PaymentIntegrationTest {

    @Autowired
    private lateinit var paymentController: PaymentController

    @Autowired
    private lateinit var featureFlagService: FeatureFlagService

    /**
     * Test payment processing with new processor enabled.
     * Integration test validates full stack with flag configuration.
     */
    @Test
    fun `should process payment successfully with new processor`() {
        // Configure flag state for test
        featureFlagService.enableForTest("new-payment-processor", enabled = true)

        val request = PaymentRequest(
            userId = "test-user-123",
            amount = Money.of(50.00, "USD"),
            paymentMethod = "card_visa_4242"
        )

        val response = paymentController.processPayment(request)

        assertThat(response.statusCode).isEqualTo(HttpStatus.OK)
        assertThat(response.body?.processor).isEqualTo("new-processor")
        assertThat(response.body?.status).isEqualTo("SUCCESS")
    }

    /**
     * Test backwards compatibility with legacy processor.
     * Ensures flag-disabled path still functions correctly.
     */
    @Test
    fun `should process payment successfully with legacy processor`() {
        // Configure flag state for test
        featureFlagService.enableForTest("new-payment-processor", enabled = false)

        val request = PaymentRequest(
            userId = "test-user-456",
            amount = Money.of(75.00, "USD"),
            paymentMethod = "card_visa_4242"
        )

        val response = paymentController.processPayment(request)

        assertThat(response.statusCode).isEqualTo(HttpStatus.OK)
        assertThat(response.body?.processor).isEqualTo("legacy-processor")
        assertThat(response.body?.status).isEqualTo("SUCCESS")
    }
}

/**
 * Test utility for controlling feature flags in integration tests.
 * Provides simple API for configuring flag states per test.
 */
@TestComponent
class TestFeatureFlagService : FeatureFlagService {
    private val flagStates = mutableMapOf<String, Boolean>()

    fun enableForTest(flagName: String, enabled: Boolean) {
        flagStates[flagName] = enabled
    }

    override fun isEnabled(flagName: String, userId: String): Boolean {
        return flagStates.getOrDefault(flagName, false)
    }

    @AfterEach
    fun resetFlags() {
        flagStates.clear()
    }
}

For end-to-end tests, consider whether to test with flags enabled or disabled based on your deployment strategy. If you typically deploy with flags disabled then gradually enable, test the disabled state in E2E tests. If you deploy with internal users enabled, test that state. See E2E Testing for comprehensive E2E testing patterns.

Contract Testing With Flags

When feature flags change API contracts, use contract testing to ensure backwards compatibility. Different flag states should maintain contract compatibility or version the API explicitly.

// Consumer-driven contract test for payment API
describe('Payment API Contract', () => {
  const provider = new Pact({
    consumer: 'mobile-app',
    provider: 'payment-service'
  });

  /**
   * Contract test for new payment response format.
   * Validates API contract with new features enabled.
   */
  it('returns enhanced payment response when new processor enabled', async () => {
    await provider.addInteraction({
      state: 'new-payment-processor is enabled',
      uponReceiving: 'a payment request',
      withRequest: {
        method: 'POST',
        path: '/api/payments',
        headers: { 'Content-Type': 'application/json' },
        body: {
          userId: 'user-123',
          amount: 100.00,
          currency: 'USD'
        }
      },
      willRespondWith: {
        status: 200,
        headers: { 'Content-Type': 'application/json' },
        body: {
          transactionId: like('txn-abc123'),
          status: 'SUCCESS',
          processor: 'new-processor',
          processingTimeMs: like(150),
          // New fields added by new processor
          riskScore: like(0.12),
          fraudChecksPassed: like(true)
        }
      }
    });

    const response = await paymentClient.createPayment({
      userId: 'user-123',
      amount: 100.00,
      currency: 'USD'
    });

    expect(response.processor).toBe('new-processor');
    expect(response.riskScore).toBeDefined();
  });

  /**
   * Contract test for legacy response format.
   * Ensures backwards compatibility when flag disabled.
   */
  it('returns standard payment response when new processor disabled', async () => {
    await provider.addInteraction({
      state: 'new-payment-processor is disabled',
      uponReceiving: 'a payment request',
      withRequest: {
        method: 'POST',
        path: '/api/payments',
        headers: { 'Content-Type': 'application/json' },
        body: {
          userId: 'user-456',
          amount: 50.00,
          currency: 'USD'
        }
      },
      willRespondWith: {
        status: 200,
        headers: { 'Content-Type': 'application/json' },
        body: {
          transactionId: like('txn-xyz789'),
          status: 'SUCCESS',
          processor: 'legacy-processor'
          // No enhanced fields
        }
      }
    });

    const response = await paymentClient.createPayment({
      userId: 'user-456',
      amount: 50.00,
      currency: 'USD'
    });

    expect(response.processor).toBe('legacy-processor');
    expect(response.riskScore).toBeUndefined();
  });
});

If feature flags introduce breaking changes, consider versioning your API (e.g., /v2/payments) rather than using flags to change existing endpoint behavior. This makes the change explicit and allows clients to migrate deliberately.

Flag-Aware E2E Tests

E2E tests for critical flows should account for feature flags. Either parameterize tests to run with different flag configurations or create separate test suites for each configuration.

// Parameterized E2E test for checkout flow
describe.each([
  { flagState: 'enabled', processor: 'new-processor' },
  { flagState: 'disabled', processor: 'legacy-processor' }
])(
  'Checkout flow with new payment processor $flagState',
  ({ flagState, processor }) => {
    beforeEach(async () => {
      // Configure feature flag for E2E environment
      await featureFlagAdmin.setFlag(
        'new-payment-processor',
        flagState === 'enabled'
      );
    });

    it('completes checkout successfully', async () => {
      await page.goto('/products/premium-widget');
      await page.click('[data-testid="add-to-cart"]');
      await page.click('[data-testid="checkout"]');

      // Fill payment details
      await page.fill('[data-testid="card-number"]', '4242424242424242');
      await page.fill('[data-testid="card-expiry"]', '12/25');
      await page.fill('[data-testid="card-cvc"]', '123');

      await page.click('[data-testid="complete-purchase"]');

      // Wait for success page
      await page.waitForSelector('[data-testid="order-confirmation"]');

      // Verify correct processor was used
      const orderDetails = await page.textContent('[data-testid="order-summary"]');
      expect(orderDetails).toContain(processor);
    });

    it('handles payment failure gracefully', async () => {
      await page.goto('/cart');
      await page.click('[data-testid="checkout"]');

      // Use test card that triggers failure
      await page.fill('[data-testid="card-number"]', '4000000000000002');
      await page.fill('[data-testid="card-expiry"]', '12/25');
      await page.fill('[data-testid="card-cvc"]', '123');

      await page.click('[data-testid="complete-purchase"]');

      // Verify error handling
      await page.waitForSelector('[data-testid="payment-error"]');
      const errorMessage = await page.textContent('[data-testid="payment-error"]');
      expect(errorMessage).toContain('declined');
    });
  }
);

This parameterized approach ensures both code paths receive E2E coverage. However, if E2E tests are slow or expensive, prioritize testing the production configuration (flags in expected state for most users).

Monitoring Flag Usage and Feature Adoption

Monitoring feature flag metrics provides visibility into flag health, feature adoption, and potential issues. Track flag evaluation counts, error rates, user adoption percentages, and business metrics tied to flagged features.

Flag Evaluation Metrics

Track flag evaluations to monitor system health and identify issues. High evaluation latency indicates flag service performance problems that could affect application performance.

@Aspect
@Component
public class FeatureFlagMetricsAspect {
    private final MeterRegistry metrics;

    /**
     * Record metrics for every feature flag evaluation.
     * Tracks evaluation count, latency, and results.
     */
    @Around("execution(* com.example.FeatureFlagService.isEnabled(..))")
    public Object recordFlagEvaluation(ProceedingJoinPoint joinPoint) throws Throwable {
        String flagName = (String) joinPoint.getArgs()[0];
        Timer.Sample sample = Timer.start(metrics);

        try {
            Boolean result = (Boolean) joinPoint.proceed();

            // Record successful evaluation
            sample.stop(Timer.builder("feature_flag.evaluation.duration")
                .tag("flag_name", flagName)
                .tag("result", result.toString())
                .tag("success", "true")
                .register(metrics));

            // Count evaluations by result
            metrics.counter("feature_flag.evaluation.count",
                "flag_name", flagName,
                "result", result.toString()
            ).increment();

            return result;
        } catch (Exception e) {
            // Record failed evaluation
            sample.stop(Timer.builder("feature_flag.evaluation.duration")
                .tag("flag_name", flagName)
                .tag("success", "false")
                .tag("error_type", e.getClass().getSimpleName())
                .register(metrics));

            metrics.counter("feature_flag.evaluation.error",
                "flag_name", flagName,
                "error_type", e.getClass().getSimpleName()
            ).increment();

            throw e;
        }
    }
}

Create dashboards showing flag evaluation metrics in your observability platform:

Evaluations per second by flag
Flag evaluation latency (p50, p95, p99)
Error rate per flag
Cache hit rate for flag values

Alert on anomalies like sudden drops in evaluation count (suggests flag was removed from code but not configuration) or spikes in evaluation latency (flag service performance degradation).

Feature Adoption Tracking

Track what percentage of users have each feature enabled to monitor rollout progress and ensure gradual rollouts proceed as expected.

class FeatureAdoptionMetrics {
  constructor(
    private featureFlags: FeatureFlagService,
    private metrics: MetricsService
  ) {}

  /**
   * Record feature adoption metrics for rollout monitoring.
   * Tracks percentage of users with feature enabled over time.
   */
  async recordAdoptionMetrics(flagName: string): Promise<void> {
    const stats = await this.featureFlags.getAdoptionStats(flagName);

    // Record current adoption percentage
    this.metrics.gauge('feature_flag.adoption_percentage', stats.enabledPercentage, {
      flag_name: flagName
    });

    // Record absolute counts
    this.metrics.gauge('feature_flag.enabled_users', stats.enabledUserCount, {
      flag_name: flagName
    });

    this.metrics.gauge('feature_flag.total_users', stats.totalUserCount, {
      flag_name: flagName
    });

    // Record by user segment
    for (const [segment, percentage] of Object.entries(stats.bySegment)) {
      this.metrics.gauge('feature_flag.adoption_percentage_by_segment', percentage, {
        flag_name: flagName,
        segment
      });
    }
  }

  /**
   * Calculate adoption velocity for rollout tracking.
   * Measures how quickly feature is being adopted.
   */
  async calculateAdoptionVelocity(
    flagName: string,
    timeWindowHours: number
  ): Promise<AdoptionVelocity> {
    const now = new Date();
    const windowStart = new Date(now.getTime() - timeWindowHours * 60 * 60 * 1000);

    const currentStats = await this.featureFlags.getAdoptionStats(flagName);
    const previousStats = await this.featureFlags.getAdoptionStatsAt(
      flagName,
      windowStart
    );

    const percentageChange =
      currentStats.enabledPercentage - previousStats.enabledPercentage;
    const userCountChange =
      currentStats.enabledUserCount - previousStats.enabledUserCount;

    return {
      flagName,
      timeWindowHours,
      percentageChange,
      userCountChange,
      projectedFullRolloutHours: this.projectFullRollout(
        currentStats.enabledPercentage,
        percentageChange,
        timeWindowHours
      )
    };
  }

  private projectFullRollout(
    currentPercentage: number,
    percentageChange: number,
    windowHours: number
  ): number | null {
    if (percentageChange <= 0) {
      return null; // Rollout not progressing
    }

    const remainingPercentage = 100 - currentPercentage;
    const hoursPerPercent = windowHours / percentageChange;
    return remainingPercentage * hoursPerPercent;
  }
}

Adoption velocity tracking helps you detect stalled rollouts or unexpected adoption patterns. If you planned to reach 50% in 48 hours but adoption is slower, investigate whether targeting rules are too restrictive or users aren't encountering the feature.

Business Metrics by Flag State

Correlate business metrics with flag states to measure feature impact. Compare conversion rates, revenue, or engagement between users with features enabled versus disabled.

-- Query to compare business metrics by feature flag state
WITH user_flag_state AS (
  SELECT
    user_id,
    flag_name,
    enabled
  FROM feature_flag_evaluations
  WHERE flag_name = 'streamlined-checkout'
    AND evaluated_at >= NOW() - INTERVAL '7 days'
  GROUP BY user_id, flag_name, enabled
),
user_conversions AS (
  SELECT
    user_id,
    COUNT(*) as purchase_count,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
  FROM purchases
  WHERE created_at >= NOW() - INTERVAL '7 days'
  GROUP BY user_id
)
SELECT
  ufs.enabled as feature_enabled,
  COUNT(DISTINCT ufs.user_id) as user_count,
  COUNT(uc.user_id) as converted_users,
  ROUND(COUNT(uc.user_id)::NUMERIC / COUNT(DISTINCT ufs.user_id) * 100, 2) as conversion_rate,
  ROUND(AVG(uc.avg_order_value), 2) as avg_order_value,
  ROUND(SUM(uc.total_revenue), 2) as total_revenue
FROM user_flag_state ufs
LEFT JOIN user_conversions uc ON ufs.user_id = uc.user_id
GROUP BY ufs.enabled
ORDER BY ufs.enabled;

This analysis reveals feature impact on key business metrics. Compare control (flag disabled) versus treatment (flag enabled) to measure effectiveness. Integration with analytics platforms enables automated dashboards showing feature performance in real time.

Best Practices Summary

Feature flags are powerful tools for safe deployments and progressive delivery, but require discipline to avoid creating technical debt and complexity. Follow these practices to maximize benefits while minimizing risks:

Flag Lifecycle

Classify flags by type (release, experiment, ops, permission) with appropriate lifecycles
Set expected removal dates for temporary flags when creating them
Review flag inventory monthly to identify cleanup candidates
Remove release flags within 90 days of full rollout completion

Implementation

Use consistent hashing for percentage rollouts to ensure stable user bucketing
Implement caching to minimize performance impact of flag evaluations
Define safe defaults for flag evaluation failures (fail open or closed based on risk)
Abstract flag service implementation to enable evolution from simple to sophisticated systems

Testing

Test both enabled and disabled states for every feature flag in unit tests
Validate graceful degradation when flag service is unavailable
Use contract testing to ensure flag-controlled API changes maintain backwards compatibility
Include flag state configuration in integration test setup

Gradual Rollouts

Start rollouts with internal users (ring 0) before external users
Progress through defined percentages (1% → 5% → 25% → 50% → 100%) with monitoring at each stage
Define clear promotion criteria (time threshold, quality metrics) for advancing rollout stages
Implement automated rollback triggers based on error rates or key metrics

Monitoring and Metrics

Track flag evaluation counts and latency to detect performance issues
Monitor feature adoption percentage to validate rollout progress
Correlate business metrics with flag states to measure feature impact
Alert on anomalies (evaluation drops, latency spikes, error rate increases)

A/B Testing

Define success metrics and minimum detectable effect before starting experiments
Ensure consistent variant assignment throughout experiment duration
Wait for statistical significance before concluding experiments
Track exposure (user saw variant) separately from assignment for accurate conversion rates

Technical Debt Prevention

Use automated tools to identify flags always enabled/disabled
Create cleanup tasks automatically when flags reach removal criteria
Perform staged removals (toggle always-on → remove flag check → remove old code → remove config)
Monitor flag evaluation to identify abandoned flags

Cross-References

See Testing Strategy for comprehensive testing approaches
See Contract Testing for API compatibility validation
See Monitoring and Alerting for metrics and alerting patterns
See Kubernetes Best Practices for canary deployment infrastructure
See Microservices Architecture for distributed systems feature flagging patterns

Types of Feature Flags​

Release Flags​

Experiment Flags​

Operational Flags​

Permission Flags​

Feature Flag Frameworks and Implementation​

Environment Variables (Simplest Approach)​

Database-Backed Toggle System​

Remote Config Services​

Feature Flag Adapter Pattern​

Gradual Rollout Strategies​

Percentage-Based Rollout​

User-Based Targeting​

Ring Deployment​

Canary Releases​

A/B Testing Integration​

Experiment Design​

Variant Assignment and Consistency​

Metrics Collection and Analysis​

Technical Debt: Flag Cleanup Strategies​

Flag Lifecycle Management​

Monitoring Flag Usage​

Automated Flag Removal Process​

Flag Debt Tracking​

Testing With Feature Flags​

Testing All Flag States​

Integration Tests With Flag Contexts​

Contract Testing With Flags​

Flag-Aware E2E Tests​

Monitoring Flag Usage and Feature Adoption​

Flag Evaluation Metrics​

Feature Adoption Tracking​

Business Metrics by Flag State​

Best Practices Summary​