Data Privacy and GDPR Compliance

Data privacy is a fundamental requirement in modern software systems, particularly for financial applications handling sensitive customer information. This guide covers the engineering practices, technical implementations, and compliance requirements for building privacy-respecting systems that meet GDPR and other data protection regulations.

PII (Personally Identifiable Information) Handling

Personally Identifiable Information (PII) is any data that can identify a specific individual. Proper identification, classification, and protection of PII is the foundation of privacy compliance.

Identifying and Classifying PII

PII classification determines how data must be protected, stored, and processed. Different types of PII require different levels of protection.

Direct identifiers can identify a person without additional information:

Full name
National identification numbers (SSN, passport number, driver's license)
Email addresses
Phone numbers
Physical addresses
Biometric data (fingerprints, facial recognition)
Financial account numbers

Quasi-identifiers can identify someone when combined with other data:

Date of birth
ZIP code
Gender
Job title
IP addresses
Device identifiers

Sensitive PII requires enhanced protection:

Financial information (account balances, transaction history)
Health information
Biometric data
Authentication credentials
Children's data (under 16 in EU, under 13 in US)

Implementation Patterns

Implement PII detection and classification at the data model level to enforce protection policies consistently:

/**
 * Annotation-based PII classification for automatic handling.
 *
 * This approach allows centralized enforcement of encryption,
 * masking, and access control policies based on data sensitivity.
 */
@Target(ElementType.FIELD)
@Retention(RetentionPolicy.RUNTIME)
public @interface PersonalData {
    PIICategory category();
    boolean encrypted() default true;
    boolean masked() default true;
}

public enum PIICategory {
    DIRECT_IDENTIFIER,
    QUASI_IDENTIFIER,
    SENSITIVE_PII
}

// Usage in entity classes
@Entity
public class Customer {
    @Id
    private UUID id;

    @PersonalData(category = PIICategory.DIRECT_IDENTIFIER)
    private String email;

    @PersonalData(category = PIICategory.SENSITIVE_PII)
    private String accountNumber;

    @PersonalData(category = PIICategory.QUASI_IDENTIFIER)
    private LocalDate dateOfBirth;

    // Non-PII data
    private CustomerStatus status;
}

This annotation-driven approach enables aspect-oriented programming (AOP) to automatically apply encryption, audit logging, and access controls. The @PersonalData annotation signals that the field contains PII and specifies its sensitivity level, allowing framework code to intercept access and apply appropriate protections without cluttering business logic.

Implement automatic PII masking for logging and monitoring to prevent accidental exposure:

/**
 * Aspect that intercepts logging calls to mask PII before output.
 *
 * Uses reflection to identify PII-annotated fields and replaces
 * their values with masked versions (e.g., "j***@example.com").
 */
@Aspect
@Component
public class PIIMaskingAspect {

    @Around("execution(* org.slf4j.Logger.*(..)) && args(message,..)")
    public Object maskPIIInLogs(ProceedingJoinPoint joinPoint, Object message) throws Throwable {
        if (message == null) {
            return joinPoint.proceed();
        }

        String masked = maskPII(message.toString());
        Object[] args = joinPoint.getArgs();
        args[0] = masked;

        return joinPoint.proceed(args);
    }

    private String maskPII(String text) {
        // Email masking: [email protected] -> j***@example.com
        text = text.replaceAll(
            "([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})",
            "$1***@$2"
        );

        // Phone number masking: +1-555-1234 -> +1-***-****
        text = text.replaceAll(
            "\\+?\\d{1,3}[-.]?\\(?\\d{3}\\)?[-.]?\\d{3}[-.]?\\d{4}",
            "***-***-****"
        );

        // Account number masking: show last 4 digits only
        text = text.replaceAll(
            "\\b\\d{4,16}\\b",
            match -> "*".repeat(match.length() - 4) + match.substring(match.length() - 4)
        );

        return text;
    }
}

This aspect uses AspectJ to intercept all logging calls system-wide. When code attempts to log a message, the aspect examines the content and applies masking rules based on pattern matching. This provides defense-in-depth: even if developers accidentally log PII, the aspect prevents it from reaching log files. The masking preserves enough information for debugging (e.g., domain of email, last 4 digits of account number) while protecting the sensitive parts.

Access Control for PII

Implement role-based access control (RBAC) to restrict PII access to authorized users only. Not all system users need access to all customer data.

/**
 * Service that enforces PII access controls based on user roles.
 *
 * Different roles have different levels of access:
 * - CUSTOMER_SERVICE: Can view masked PII for support purposes
 * - COMPLIANCE_OFFICER: Can view full PII for investigations
 * - DEVELOPER: Cannot access production PII
 */
@Service
public class PIIAccessControlService {

    public CustomerData getCustomerData(UUID customerId, User requestingUser) {
        Customer customer = customerRepository.findById(customerId)
            .orElseThrow(() -> new CustomerNotFoundException(customerId));

        // Audit all PII access attempts
        auditService.logPIIAccess(
            requestingUser.getId(),
            customerId,
            requestingUser.getRoles(),
            LocalDateTime.now()
        );

        // Apply role-based filtering
        return switch (requestingUser.getHighestRole()) {
            case COMPLIANCE_OFFICER -> mapToFullData(customer);
            case CUSTOMER_SERVICE -> mapToMaskedData(customer);
            case DEVELOPER -> throw new AccessDeniedException("Developers cannot access production PII");
            default -> throw new AccessDeniedException("Insufficient permissions");
        };
    }

    private CustomerData mapToMaskedData(Customer customer) {
        return CustomerData.builder()
            .id(customer.getId())
            .email(maskEmail(customer.getEmail()))
            .phoneNumber(maskPhone(customer.getPhoneNumber()))
            .accountNumber(maskAccountNumber(customer.getAccountNumber()))
            .dateOfBirth(null) // Completely hidden
            .build();
    }

    private String maskEmail(String email) {
        String[] parts = email.split("@");
        if (parts.length != 2) return "***@***";

        String localPart = parts[0];
        String maskedLocal = localPart.charAt(0) + "***";
        return maskedLocal + "@" + parts[1];
    }
}

This service centralizes PII access control decisions. Every request for customer data passes through this service, which checks the user's role and applies appropriate data filtering. The audit logging creates an immutable record of who accessed what data and when, which is crucial for compliance investigations and detecting unauthorized access patterns. The role-based filtering ensures that users only see the minimum data necessary for their job function, implementing the principle of least privilege.

Data Minimization Principles

Data minimization means collecting and retaining only the data necessary for specific, legitimate purposes. This reduces privacy risk and simplifies compliance.

Collection Strategy

Before collecting any data, ask three questions:

Purpose: Why do we need this data? What specific business function does it support?
Necessity: Can we achieve the same purpose without this data or with less granular data?
Duration: How long do we need to keep this data?

Implement validation at the API layer to reject unnecessary data:

/**
 * Schema validation that enforces data minimization.
 *
 * Only accept fields that are strictly necessary for the operation.
 * Reject additional fields to prevent over-collection.
 */
const customerRegistrationSchema = z.object({
  // Required fields only
  email: z.string().email(),
  firstName: z.string().min(1).max(50),
  lastName: z.string().min(1).max(50),
  dateOfBirth: z.date().refine(
    (date) => differenceInYears(new Date(), date) >= 18,
    { message: "Must be 18 or older" }
  ),

  // Optional fields with clear purpose
  phoneNumber: z.string().optional(), // For 2FA and support contact

  // Explicitly forbidden fields
  ssn: z.never().optional(), // Not collected during registration
  driversLicense: z.never().optional() // Not collected during registration
}).strict(); // Reject unknown properties

// In the registration endpoint
app.post('/api/customers/register', async (req, res) => {
  try {
    // Validation will fail if extra fields are present
    const validatedData = customerRegistrationSchema.parse(req.body);

    const customer = await customerService.createCustomer(validatedData);

    res.status(201).json({ customerId: customer.id });
  } catch (error) {
    if (error instanceof z.ZodError) {
      // Log rejected fields for monitoring over-collection attempts
      const extraFields = error.errors
        .filter(e => e.code === 'unrecognized_keys')
        .flatMap(e => e.keys);

      if (extraFields.length > 0) {
        logger.warn('Registration attempted with unnecessary fields', {
          fields: extraFields,
          ip: req.ip
        });
      }
    }

    res.status(400).json({ error: 'Invalid registration data' });
  }
});

The .strict() modifier on the Zod schema ensures that any extra fields in the request are rejected. This prevents clients from sending more data than necessary, either accidentally or maliciously. The logging of rejected fields helps identify if client applications are attempting to over-collect data, which could indicate a privacy issue in the client code.

Storage Optimization

Store aggregated or anonymized data for analytics instead of raw PII:

/**
 * Aggregated metrics that provide analytics value without storing PII.
 *
 * Instead of storing individual customer transactions with PII,
 * we aggregate data into daily summaries by demographic segments.
 */
@Entity
public class DailyTransactionMetrics {
    @Id
    private UUID id;

    private LocalDate date;

    // Aggregated data - no individual identification possible
    private String ageGroup; // "18-25", "26-35", etc.
    private String region; // City or state level, not full address
    private BigDecimal totalTransactionVolume;
    private Long transactionCount;
    private BigDecimal averageTransactionAmount;

    // No customer IDs, names, or direct identifiers
}

/**
 * Service that creates aggregated metrics from transaction data.
 *
 * This batch job runs nightly to aggregate transaction data,
 * then the raw transaction PII can be archived or deleted per
 * retention policies while preserving analytics capability.
 */
@Service
public class MetricsAggregationService {

    @Scheduled(cron = "0 0 1 * * *") // Run at 1 AM daily
    public void aggregateTransactions() {
        LocalDate yesterday = LocalDate.now().minusDays(1);

        List<Transaction> transactions = transactionRepository
            .findByDateBetween(
                yesterday.atStartOfDay(),
                yesterday.plusDays(1).atStartOfDay()
            );

        // Group by demographics without storing individual data
        Map<AgeGroup, Map<String, List<Transaction>>> grouped = transactions.stream()
            .collect(Collectors.groupingBy(
                t -> calculateAgeGroup(t.getCustomerDateOfBirth()),
                Collectors.groupingBy(t -> t.getCustomerRegion())
            ));

        // Create aggregate records
        grouped.forEach((ageGroup, regionMap) -> {
            regionMap.forEach((region, txns) -> {
                DailyTransactionMetrics metrics = new DailyTransactionMetrics();
                metrics.setDate(yesterday);
                metrics.setAgeGroup(ageGroup.toString());
                metrics.setRegion(region);
                metrics.setTransactionCount((long) txns.size());
                metrics.setTotalTransactionVolume(
                    txns.stream()
                        .map(Transaction::getAmount)
                        .reduce(BigDecimal.ZERO, BigDecimal::add)
                );
                metrics.setAverageTransactionAmount(
                    metrics.getTotalTransactionVolume()
                        .divide(BigDecimal.valueOf(txns.size()), RoundingMode.HALF_UP)
                );

                metricsRepository.save(metrics);
            });
        });
    }
}

This aggregation strategy provides valuable business intelligence without retaining individual-level data indefinitely. The daily aggregation job processes transactions to create summary statistics grouped by age ranges and regions. Once aggregated, the detailed transaction records containing PII can be archived or deleted according to retention policies, while the anonymized metrics remain available for long-term trend analysis. This reduces both privacy risk and storage costs.

The right to erasure, also known as the "right to be forgotten," requires organizations to delete personal data upon request under certain conditions. Implementation must handle data distributed across multiple systems and backups.

Deletion Workflow

Implementation

The deletion process must be idempotent, auditable, and handle failures gracefully. Some data may need to be retained for legal or contractual obligations.

/**
 * Service implementing GDPR right to erasure.
 *
 * Handles deletion requests by:
 * 1. Validating the request can be fulfilled
 * 2. Checking for legal holds or retention requirements
 * 3. Orchestrating deletion across all systems
 * 4. Creating an audit trail of the deletion
 */
@Service
public class DataErasureService {

    private final CustomerRepository customerRepository;
    private final TransactionRepository transactionRepository;
    private final AnalyticsService analyticsService;
    private final BackupManagementService backupService;
    private final AuditService auditService;
    private final LegalHoldService legalHoldService;

    /**
     * Initiates a data erasure request.
     *
     * Returns a deletion job ID that can be used to track progress.
     * The actual deletion happens asynchronously to handle distributed systems.
     */
    public UUID initiateErasure(UUID customerId, ErasureRequest request) {
        // Verify customer exists
        Customer customer = customerRepository.findById(customerId)
            .orElseThrow(() -> new CustomerNotFoundException(customerId));

        // Check for legal holds (litigation, investigations, regulatory requirements)
        if (legalHoldService.hasActiveLegalHold(customerId)) {
            throw new DeletionBlockedException(
                "Cannot delete data - subject to legal hold"
            );
        }

        // Check regulatory retention requirements
        RetentionPolicy policy = determineRetentionPolicy(customer);
        if (!policy.canDelete()) {
            throw new DeletionBlockedException(
                String.format("Data must be retained until %s for regulatory compliance",
                    policy.getRetentionEndDate())
            );
        }

        // Create deletion job
        DeletionJob job = DeletionJob.builder()
            .id(UUID.randomUUID())
            .customerId(customerId)
            .requestedAt(Instant.now())
            .requestedBy(request.getRequesterId())
            .status(DeletionStatus.PENDING)
            .build();

        deletionJobRepository.save(job);

        // Audit the deletion request
        auditService.logDeletionRequest(
            customerId,
            request.getReason(),
            request.getRequesterId(),
            job.getId()
        );

        // Queue for asynchronous processing
        deletionQueue.enqueue(job.getId());

        return job.getId();
    }

    /**
     * Processes a deletion job asynchronously.
     *
     * This method is called by a background worker and handles
     * orchestrating deletion across all systems that store customer data.
     */
    @Async
    @Transactional
    public void processDeletion(UUID jobId) {
        DeletionJob job = deletionJobRepository.findById(jobId)
            .orElseThrow(() -> new JobNotFoundException(jobId));

        try {
            job.setStatus(DeletionStatus.IN_PROGRESS);
            deletionJobRepository.save(job);

            UUID customerId = job.getCustomerId();

            // 1. Delete from primary database
            deleteCustomerData(customerId);

            // 2. Delete/anonymize analytics data
            analyticsService.anonymizeCustomerData(customerId);

            // 3. Remove from search indexes
            searchService.removeCustomer(customerId);

            // 4. Delete from cache
            cacheService.evictCustomerData(customerId);

            // 5. Schedule backup purge
            backupService.schedulePurge(customerId);

            // 6. Notify external systems (payment processors, KYC providers)
            notifyExternalSystems(customerId);

            job.setStatus(DeletionStatus.COMPLETED);
            job.setCompletedAt(Instant.now());
            deletionJobRepository.save(job);

            // Audit successful deletion
            auditService.logDeletionCompleted(customerId, jobId);

        } catch (Exception e) {
            job.setStatus(DeletionStatus.FAILED);
            job.setErrorMessage(e.getMessage());
            deletionJobRepository.save(job);

            auditService.logDeletionFailed(job.getCustomerId(), jobId, e);

            // Alert operations team for manual intervention
            alertService.sendDeletionFailureAlert(job);
        }
    }

    private void deleteCustomerData(UUID customerId) {
        // Delete related data first (foreign key constraints)
        transactionRepository.deleteByCustomerId(customerId);
        documentRepository.deleteByCustomerId(customerId);
        addressRepository.deleteByCustomerId(customerId);

        // Finally delete the customer record
        customerRepository.deleteById(customerId);
    }

    private void notifyExternalSystems(UUID customerId) {
        // Notify payment processors to delete stored payment methods
        paymentProcessorClient.deleteCustomerData(customerId);

        // Notify identity verification providers
        kycProviderClient.deleteVerificationData(customerId);
    }

    private RetentionPolicy determineRetentionPolicy(Customer customer) {
        // Financial regulations often require retaining transaction records
        // for 5-7 years after account closure
        LocalDate accountClosedDate = customer.getClosedAt().toLocalDate();
        LocalDate retentionEndDate = accountClosedDate.plusYears(7);

        if (LocalDate.now().isBefore(retentionEndDate)) {
            return RetentionPolicy.mustRetainUntil(retentionEndDate);
        }

        return RetentionPolicy.canDelete();
    }
}

This implementation handles the complexity of distributed data deletion. The asynchronous processing pattern allows the API to respond quickly while deletion propagates through multiple systems. The determineRetentionPolicy method implements business logic for regulatory requirements - for example, financial regulations often require retaining transaction records for 5-7 years even if the customer requests deletion. The legal hold check prevents deletion of data involved in ongoing litigation or regulatory investigations.

Backup Considerations

Backups present a special challenge for the right to erasure because they are immutable snapshots. Rather than modifying backups (which would compromise their integrity), implement a purge list:

/**
 * Tracks deleted customer IDs to filter them from backup restores.
 *
 * When restoring from backup, check each customer ID against this list
 * and exclude any customers who have been deleted since the backup was created.
 */
@Entity
public class BackupPurgeList {
    @Id
    private UUID customerId;

    private Instant deletedAt;
    private String deletionJobId;

    // This record is never deleted - permanent audit trail
}

/**
 * Service that handles backup restoration while respecting deletions.
 */
@Service
public class BackupRestoreService {

    public void restoreFromBackup(Backup backup) {
        // Load all customer IDs that were deleted after this backup was created
        Set<UUID> deletedCustomers = backupPurgeListRepository
            .findByDeletedAtAfter(backup.getCreatedAt())
            .stream()
            .map(BackupPurgeList::getCustomerId)
            .collect(Collectors.toSet());

        // Restore data, filtering out deleted customers
        backup.getCustomers().stream()
            .filter(customer -> !deletedCustomers.contains(customer.getId()))
            .forEach(customerRepository::save);
    }
}

The backup purge list acts as a tombstone registry. When a customer is deleted, their ID is added to this list with a timestamp. During backup restoration (e.g., disaster recovery), the restore process filters out any customer IDs in the purge list with deletion dates after the backup creation date. This preserves backup integrity while honoring deletion requests. The purge list itself is never deleted, maintaining an audit trail of all erasure requests.

Data portability allows individuals to obtain and reuse their personal data across different services. The data must be provided in a structured, commonly used, and machine-readable format.

Export Implementation

/**
 * Service that generates machine-readable customer data exports.
 *
 * Supports multiple formats (JSON, CSV, XML) and includes all
 * personal data the customer has provided or generated.
 */
@Service
public class DataPortabilityService {

    /**
     * Creates a complete export of customer data.
     *
     * Returns a signed URL for download that expires after 24 hours.
     */
    public DataExportJob initiateExport(UUID customerId, ExportFormat format) {
        // Verify customer exists and requesting user has permission
        verifyAccess(customerId);

        DataExportJob job = DataExportJob.builder()
            .id(UUID.randomUUID())
            .customerId(customerId)
            .format(format)
            .status(ExportStatus.PENDING)
            .requestedAt(Instant.now())
            .build();

        exportJobRepository.save(job);

        // Process asynchronously for large data sets
        exportQueue.enqueue(job.getId());

        return job;
    }

    @Async
    public void processExport(UUID jobId) {
        DataExportJob job = exportJobRepository.findById(jobId)
            .orElseThrow(() -> new JobNotFoundException(jobId));

        try {
            job.setStatus(ExportStatus.IN_PROGRESS);
            exportJobRepository.save(job);

            // Collect all personal data
            CustomerDataExport export = collectCustomerData(job.getCustomerId());

            // Serialize to requested format
            byte[] serialized = serialize(export, job.getFormat());

            // Upload to secure storage with temporary access
            String fileKey = String.format("exports/%s/%s.%s",
                job.getCustomerId(),
                job.getId(),
                job.getFormat().getFileExtension()
            );

            storageService.upload(fileKey, serialized);

            // Generate signed URL valid for 24 hours
            String downloadUrl = storageService.generatePresignedUrl(
                fileKey,
                Duration.ofHours(24)
            );

            job.setStatus(ExportStatus.COMPLETED);
            job.setCompletedAt(Instant.now());
            job.setDownloadUrl(downloadUrl);
            exportJobRepository.save(job);

            // Notify customer via email
            notificationService.sendExportReadyEmail(
                export.getCustomer().getEmail(),
                downloadUrl
            );

            // Audit the export
            auditService.logDataExport(job.getCustomerId(), jobId);

        } catch (Exception e) {
            job.setStatus(ExportStatus.FAILED);
            job.setErrorMessage(e.getMessage());
            exportJobRepository.save(job);
        }
    }

    private CustomerDataExport collectCustomerData(UUID customerId) {
        Customer customer = customerRepository.findById(customerId)
            .orElseThrow(() -> new CustomerNotFoundException(customerId));

        return CustomerDataExport.builder()
            .customer(mapCustomerProfile(customer))
            .transactions(transactionRepository.findByCustomerId(customerId))
            .documents(documentRepository.findByCustomerId(customerId))
            .addresses(addressRepository.findByCustomerId(customerId))
            .preferences(preferenceRepository.findByCustomerId(customerId))
            .consentRecords(consentRepository.findByCustomerId(customerId))
            .loginHistory(loginHistoryRepository.findByCustomerId(customerId))
            .exportedAt(Instant.now())
            .build();
    }

    private byte[] serialize(CustomerDataExport export, ExportFormat format) {
        return switch (format) {
            case JSON -> objectMapper.writeValueAsBytes(export);
            case XML -> xmlMapper.writeValueAsBytes(export);
            case CSV -> csvWriter.write(export);
        };
    }
}

The data export includes all personal data the system has collected about the customer, organized in a logical structure. The asynchronous processing pattern handles large exports without blocking API responses. The presigned URL provides secure, time-limited access to the export file without requiring authentication for the download. The 24-hour expiration balances convenience with security - long enough for the customer to download but short enough to limit exposure risk.

Example JSON export structure:

{
  "exportedAt": "2024-01-15T10:30:00Z",
  "customer": {
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "email": "[email protected]",
    "firstName": "John",
    "lastName": "Doe",
    "dateOfBirth": "1990-05-15",
    "phoneNumber": "+1-555-0123",
    "createdAt": "2020-03-10T08:00:00Z"
  },
  "addresses": [
    {
      "type": "HOME",
      "street": "123 Main St",
      "city": "Springfield",
      "state": "IL",
      "postalCode": "62701",
      "country": "US"
    }
  ],
  "transactions": [
    {
      "id": "txn_123",
      "date": "2024-01-10T14:23:00Z",
      "amount": 125.50,
      "currency": "USD",
      "description": "Online purchase",
      "merchant": "Example Store"
    }
  ],
  "preferences": {
    "language": "en-US",
    "notifications": {
      "email": true,
      "sms": false,
      "push": true
    },
    "marketingConsent": false
  },
  "consentRecords": [
    {
      "consentType": "TERMS_OF_SERVICE",
      "granted": true,
      "timestamp": "2020-03-10T08:00:00Z",
      "version": "1.2"
    },
    {
      "consentType": "MARKETING",
      "granted": false,
      "timestamp": "2020-03-10T08:00:00Z",
      "version": "1.0"
    }
  ]
}

The export format is designed for portability - another service should be able to import this data with minimal transformation. The inclusion of consent records and timestamps provides transparency into how the customer's data has been used.

Consent is the legal basis for processing personal data under GDPR. Systems must track what consent was given, when, for what purpose, and provide mechanisms for withdrawal.

/**
 * Entity representing a consent record.
 *
 * Each consent is versioned, timestamped, and tied to a specific purpose.
 * Consent can be granular - users can opt in to some purposes and not others.
 */
@Entity
public class ConsentRecord {
    @Id
    private UUID id;

    private UUID customerId;

    @Enumerated(EnumType.STRING)
    private ConsentPurpose purpose;

    private boolean granted;

    private String version; // Version of terms/policy

    private Instant timestamp;

    private String ipAddress; // Evidence of consent

    private String userAgent; // Additional evidence

    // Method of consent (checkbox, verbal, written)
    @Enumerated(EnumType.STRING)
    private ConsentMethod method;

    // Consent can be withdrawn - never delete consent records for audit
    private Instant withdrawnAt;
}

public enum ConsentPurpose {
    TERMS_OF_SERVICE,      // Required - account cannot exist without this
    PRIVACY_POLICY,        // Required
    MARKETING_EMAIL,       // Optional
    MARKETING_SMS,         // Optional
    DATA_SHARING_PARTNERS, // Optional
    ANALYTICS_TRACKING,    // Optional, but may affect functionality
    PERSONALIZATION        // Optional
}

public enum ConsentMethod {
    CHECKBOX,              // Web form checkbox
    API_CALL,              // Mobile app or API integration
    VERBAL,                // Phone consent (must be recorded)
    WRITTEN,               // Signed document
    IMPLIED                // Pre-ticked boxes - NOT valid under GDPR!
}

This consent model captures all required evidence to demonstrate GDPR compliance. Each consent record is immutable - if consent is withdrawn, a new record is created with withdrawnAt set rather than updating or deleting the original. This maintains a complete audit trail. The version field tracks which version of the terms or policy the user consented to, important when terms change and re-consent is required.

/**
 * Frontend consent collection component.
 *
 * Presents granular consent options with clear descriptions
 * of what each consent means. Required consents are clearly marked.
 */
interface ConsentFormProps {
  onSubmit: (consents: ConsentSelection[]) => Promise<void>;
}

export function ConsentForm({ onSubmit }: ConsentFormProps) {
  const [consents, setConsents] = useState<Record<ConsentPurpose, boolean>>({
    TERMS_OF_SERVICE: false,
    PRIVACY_POLICY: false,
    MARKETING_EMAIL: false,
    MARKETING_SMS: false,
    DATA_SHARING_PARTNERS: false,
    ANALYTICS_TRACKING: true, // Opt-out for analytics
    PERSONALIZATION: true,
  });

  const consentDescriptions: Record<ConsentPurpose, ConsentDescription> = {
    TERMS_OF_SERVICE: {
      label: 'Terms of Service',
      description: 'I agree to the terms of service governing use of this platform.',
      required: true,
      learnMoreUrl: '/legal/terms',
    },
    PRIVACY_POLICY: {
      label: 'Privacy Policy',
      description: 'I acknowledge the privacy policy explaining how my data is processed.',
      required: true,
      learnMoreUrl: '/legal/privacy',
    },
    MARKETING_EMAIL: {
      label: 'Marketing Emails',
      description: 'I want to receive promotional offers and updates via email.',
      required: false,
      learnMoreUrl: '/legal/marketing',
    },
    MARKETING_SMS: {
      label: 'Marketing SMS',
      description: 'I want to receive promotional offers via SMS.',
      required: false,
      learnMoreUrl: '/legal/marketing',
    },
    DATA_SHARING_PARTNERS: {
      label: 'Data Sharing with Partners',
      description: 'I consent to sharing my data with trusted partners for enhanced services.',
      required: false,
      learnMoreUrl: '/legal/partners',
    },
    ANALYTICS_TRACKING: {
      label: 'Analytics and Usage Tracking',
      description: 'Allow collection of usage data to improve the service.',
      required: false,
      learnMoreUrl: '/legal/analytics',
    },
    PERSONALIZATION: {
      label: 'Personalized Experience',
      description: 'Use my preferences and history to personalize my experience.',
      required: false,
      learnMoreUrl: '/legal/personalization',
    },
  };

  const canSubmit = consents.TERMS_OF_SERVICE && consents.PRIVACY_POLICY;

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();

    if (!canSubmit) {
      toast.error('You must accept the required terms to continue');
      return;
    }

    const selections: ConsentSelection[] = Object.entries(consents).map(
      ([purpose, granted]) => ({
        purpose: purpose as ConsentPurpose,
        granted,
        timestamp: new Date().toISOString(),
      })
    );

    await onSubmit(selections);
  };

  return (
    <form onSubmit={handleSubmit} className="consent-form">
      <h2>Privacy Preferences</h2>
      <p className="consent-intro">
        We respect your privacy. Please review and select your preferences below.
        Required items are marked with an asterisk (*).
      </p>

      {Object.entries(consentDescriptions).map(([purpose, desc]) => (
        <div key={purpose} className="consent-item">
          <label>
            <input
              type="checkbox"
              checked={consents[purpose as ConsentPurpose]}
              onChange={(e) =>
                setConsents({
                  ...consents,
                  [purpose]: e.target.checked,
                })
              }
              required={desc.required}
            />
            <span className="consent-label">
              {desc.label}
              {desc.required && <span className="required">*</span>}
            </span>
          </label>
          <p className="consent-description">{desc.description}</p>
          <a href={desc.learnMoreUrl} target="_blank" rel="noopener">
            Learn more
          </a>
        </div>
      ))}

      <button type="submit" disabled={!canSubmit}>
        Continue
      </button>
    </form>
  );
}

This consent form implements GDPR requirements: consents are unbundled (each purpose has its own checkbox), required consents are clearly marked, descriptions are plain language rather than legal jargon, and links to full policy documents are provided. The form cannot be submitted without required consents, but optional consents can be freely enabled or disabled. Pre-checked boxes for required consents are not used - users must actively check them, demonstrating informed consent.

/**
 * Service handling consent withdrawal requests.
 *
 * When consent is withdrawn, systems must stop processing data
 * for that purpose and potentially delete data collected under that consent.
 */
@Service
public class ConsentManagementService {

    public void withdrawConsent(UUID customerId, ConsentPurpose purpose) {
        // Find the active consent record
        ConsentRecord activeConsent = consentRepository
            .findActiveConsent(customerId, purpose)
            .orElseThrow(() -> new ConsentNotFoundException());

        // Mark as withdrawn (do not delete - maintain audit trail)
        activeConsent.setWithdrawnAt(Instant.now());
        consentRepository.save(activeConsent);

        // Audit the withdrawal
        auditService.logConsentWithdrawal(customerId, purpose);

        // Take purpose-specific actions
        handleConsentWithdrawal(customerId, purpose);
    }

    private void handleConsentWithdrawal(UUID customerId, ConsentPurpose purpose) {
        switch (purpose) {
            case MARKETING_EMAIL -> {
                // Unsubscribe from all marketing lists
                emailService.unsubscribeFromMarketing(customerId);
            }
            case MARKETING_SMS -> {
                // Remove phone number from SMS marketing lists
                smsService.unsubscribeFromMarketing(customerId);
            }
            case DATA_SHARING_PARTNERS -> {
                // Notify partners to stop using customer data
                partnerIntegrationService.revokeDataSharing(customerId);
            }
            case ANALYTICS_TRACKING -> {
                // Anonymize or delete analytics data
                analyticsService.stopTracking(customerId);
            }
            case PERSONALIZATION -> {
                // Delete personalization profile
                personalizationService.deleteProfile(customerId);
            }
            case TERMS_OF_SERVICE, PRIVACY_POLICY -> {
                // Cannot withdraw required consents - must close account
                throw new IllegalOperationException(
                    "Required consent cannot be withdrawn. Account closure required."
                );
            }
        }
    }
}

Consent withdrawal must be as easy as granting consent. The system immediately stops processing data for the withdrawn purpose and takes purpose-specific cleanup actions. For marketing consents, this means unsubscribing from lists; for data sharing, it means notifying partners to stop using the data. Required consents (terms of service, privacy policy) cannot be withdrawn individually - if a customer wants to revoke these, they must close their account entirely, which triggers the full erasure workflow.

Data Retention Policies

Data retention policies define how long different types of data must be kept and when it should be deleted. These policies balance legal requirements, business needs, and privacy principles.

Retention Policy Framework

/**
 * Configuration defining retention periods for different data types.
 *
 * Policies are based on legal requirements, regulatory guidance,
 * and business necessity. Retention periods vary by data type and jurisdiction.
 */
@Configuration
public class RetentionPolicyConfig {

    /**
     * Defines retention periods for various data categories.
     *
     * After the retention period expires, data is eligible for deletion
     * unless there's a legal hold or active business need.
     */
    @Bean
    public Map<DataCategory, RetentionPeriod> retentionPolicies() {
        return Map.of(
            // Financial data - regulatory requirements
            DataCategory.TRANSACTION_RECORDS,
                RetentionPeriod.years(7), // SOX, tax regulations

            DataCategory.ACCOUNT_STATEMENTS,
                RetentionPeriod.years(7),

            // Customer data - business need
            DataCategory.CUSTOMER_PROFILE,
                RetentionPeriod.afterAccountClosure(30, ChronoUnit.DAYS),

            // Audit and security
            DataCategory.AUDIT_LOGS,
                RetentionPeriod.years(3), // Security incident investigation

            DataCategory.LOGIN_HISTORY,
                RetentionPeriod.years(1), // Fraud detection and investigation

            // Analytics
            DataCategory.ANALYTICS_EVENTS,
                RetentionPeriod.months(6), // Business intelligence

            // Communications
            DataCategory.CUSTOMER_SUPPORT_TICKETS,
                RetentionPeriod.years(2), // Quality assurance, training

            // Marketing
            DataCategory.MARKETING_CONSENT,
                RetentionPeriod.indefinite(), // Must maintain consent record

            // Temporary data
            DataCategory.SESSION_DATA,
                RetentionPeriod.hours(24),

            DataCategory.PASSWORD_RESET_TOKENS,
                RetentionPeriod.hours(1)
        );
    }
}

/**
 * Represents a retention period with various calculation strategies.
 */
public class RetentionPeriod {
    private final TemporalUnit unit;
    private final long amount;
    private final RetentionStrategy strategy;

    public static RetentionPeriod years(long years) {
        return new RetentionPeriod(years, ChronoUnit.YEARS, RetentionStrategy.FROM_CREATION);
    }

    public static RetentionPeriod months(long months) {
        return new RetentionPeriod(months, ChronoUnit.MONTHS, RetentionStrategy.FROM_CREATION);
    }

    public static RetentionPeriod afterAccountClosure(long amount, TemporalUnit unit) {
        return new RetentionPeriod(amount, unit, RetentionStrategy.AFTER_CLOSURE);
    }

    public static RetentionPeriod indefinite() {
        return new RetentionPeriod(Long.MAX_VALUE, ChronoUnit.YEARS, RetentionStrategy.INDEFINITE);
    }

    public boolean shouldDelete(Instant createdAt, Instant closedAt) {
        if (strategy == RetentionStrategy.INDEFINITE) {
            return false;
        }

        Instant baseTime = strategy == RetentionStrategy.AFTER_CLOSURE && closedAt != null
            ? closedAt
            : createdAt;

        Instant expirationTime = baseTime.plus(amount, unit);
        return Instant.now().isAfter(expirationTime);
    }
}

This configuration centralizes retention policy definitions, making them easy to audit and update when regulations change. Different strategies handle different scenarios: FROM_CREATION means retention period starts when data is created, AFTER_CLOSURE means it starts when the account closes, and INDEFINITE means the data must be kept permanently (like consent records for audit purposes).

Automated Deletion

/**
 * Scheduled job that enforces retention policies.
 *
 * Runs nightly to identify and delete data that has exceeded
 * its retention period. Implements safety checks to prevent
 * accidental deletion of data under legal hold.
 */
@Service
public class DataRetentionService {

    @Scheduled(cron = "0 2 * * * *") // Run at 2 AM daily
    @Transactional
    public void enforceRetentionPolicies() {
        logger.info("Starting retention policy enforcement");

        Map<DataCategory, RetentionPeriod> policies = retentionPolicyConfig.retentionPolicies();

        policies.forEach((category, period) -> {
            if (period.isIndefinite()) {
                return; // Skip indefinite retention
            }

            try {
                deleteExpiredData(category, period);
            } catch (Exception e) {
                // Log but don't fail entire job if one category fails
                logger.error("Failed to delete {} data", category, e);
                alertService.sendRetentionJobFailure(category, e);
            }
        });

        logger.info("Retention policy enforcement completed");
    }

    private void deleteExpiredData(DataCategory category, RetentionPeriod period) {
        switch (category) {
            case TRANSACTION_RECORDS -> deleteExpiredTransactions(period);
            case AUDIT_LOGS -> deleteExpiredAuditLogs(period);
            case LOGIN_HISTORY -> deleteExpiredLoginHistory(period);
            case ANALYTICS_EVENTS -> deleteExpiredAnalytics(period);
            case CUSTOMER_SUPPORT_TICKETS -> deleteExpiredTickets(period);
            case SESSION_DATA -> deleteExpiredSessions(period);
            case PASSWORD_RESET_TOKENS -> deleteExpiredTokens(period);
        }
    }

    private void deleteExpiredTransactions(RetentionPeriod period) {
        // Find transactions that exceed retention period
        List<Transaction> expired = transactionRepository
            .findAll()
            .stream()
            .filter(t -> period.shouldDelete(t.getCreatedAt(), t.getAccount().getClosedAt()))
            .filter(t -> !legalHoldService.hasActiveLegalHold(t.getCustomerId()))
            .toList();

        if (expired.isEmpty()) {
            return;
        }

        logger.info("Deleting {} expired transaction records", expired.size());

        // Batch delete for efficiency
        transactionRepository.deleteAllInBatch(expired);

        // Audit the deletion
        auditService.logBulkDeletion(
            DataCategory.TRANSACTION_RECORDS,
            expired.size(),
            "Retention policy enforcement"
        );
    }

    private void deleteExpiredAuditLogs(RetentionPeriod period) {
        // Special handling for audit logs - archive before deletion
        List<AuditLog> expired = auditLogRepository
            .findAll()
            .stream()
            .filter(log -> period.shouldDelete(log.getCreatedAt(), null))
            .toList();

        if (expired.isEmpty()) {
            return;
        }

        // Archive to cold storage before deletion
        archiveService.archiveAuditLogs(expired);

        // Then delete from active database
        auditLogRepository.deleteAllInBatch(expired);

        logger.info("Archived and deleted {} expired audit logs", expired.size());
    }
}

The automated deletion job runs nightly during low-traffic hours. Each data category has specific deletion logic that accounts for its unique characteristics. For audit logs, data is archived to cold storage before deletion to maintain long-term compliance evidence while removing it from the active database. The legal hold check prevents deletion of data involved in litigation or investigations, even if it exceeds the normal retention period.

Encryption Requirements

Encryption protects data from unauthorized access, both when stored (at-rest) and when transmitted (in-transit). Different types of data require different encryption approaches.

Encryption at Rest

/**
 * JPA attribute converter for transparent field-level encryption.
 *
 * Automatically encrypts sensitive fields when persisting to database
 * and decrypts when reading. The encryption is transparent to business logic.
 */
@Converter
public class EncryptedStringConverter implements AttributeConverter<String, String> {

    private final EncryptionService encryptionService;

    @Autowired
    public EncryptedStringConverter(EncryptionService encryptionService) {
        this.encryptionService = encryptionService;
    }

    /**
     * Encrypts the attribute value before storing in database.
     *
     * Uses AES-256-GCM with unique initialization vector per value
     * to ensure semantic security (same plaintext produces different ciphertext).
     */
    @Override
    public String convertToDatabaseColumn(String attribute) {
        if (attribute == null) {
            return null;
        }
        return encryptionService.encrypt(attribute);
    }

    /**
     * Decrypts the database value when loading entity.
     */
    @Override
    public String convertToEntityAttribute(String dbData) {
        if (dbData == null) {
            return null;
        }
        return encryptionService.decrypt(dbData);
    }
}

/**
 * Service handling encryption operations using envelope encryption pattern.
 *
 * Uses AWS KMS (or similar HSM) for key management, with local data encryption keys
 * encrypted by a master key in the HSM. This provides key rotation capability
 * and audit logging of key usage.
 */
@Service
public class EncryptionService {

    private final AWSKMS kmsClient;
    private final String masterKeyId;

    /**
     * Encrypts data using AES-256-GCM with envelope encryption.
     *
     * Process:
     * 1. Generate unique data encryption key (DEK) for this value
     * 2. Encrypt the plaintext with the DEK
     * 3. Encrypt the DEK with the master key from KMS
     * 4. Store both the encrypted DEK and encrypted data
     *
     * This allows key rotation without re-encrypting all data.
     */
    public String encrypt(String plaintext) {
        try {
            // Generate data encryption key
            GenerateDataKeyResponse dataKey = kmsClient.generateDataKey(
                GenerateDataKeyRequest.builder()
                    .keyId(masterKeyId)
                    .keySpec(DataKeySpec.AES_256)
                    .build()
            );

            // Encrypt plaintext with DEK using AES-GCM
            SecretKey secretKey = new SecretKeySpec(
                dataKey.plaintext().asByteArray(),
                "AES"
            );

            Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
            byte[] iv = generateIV();
            GCMParameterSpec gcmSpec = new GCMParameterSpec(128, iv);
            cipher.init(Cipher.ENCRYPT_MODE, secretKey, gcmSpec);

            byte[] ciphertext = cipher.doFinal(plaintext.getBytes(StandardCharsets.UTF_8));

            // Combine: encrypted DEK + IV + ciphertext
            // Format: [encryptedDEK length (4 bytes)][encrypted DEK][IV (12 bytes)][ciphertext]
            ByteBuffer buffer = ByteBuffer.allocate(
                4 + dataKey.ciphertextBlob().asByteArray().length + 12 + ciphertext.length
            );

            buffer.putInt(dataKey.ciphertextBlob().asByteArray().length);
            buffer.put(dataKey.ciphertextBlob().asByteArray());
            buffer.put(iv);
            buffer.put(ciphertext);

            // Encode as base64 for storage
            return Base64.getEncoder().encodeToString(buffer.array());

        } catch (Exception e) {
            throw new EncryptionException("Failed to encrypt data", e);
        }
    }

    /**
     * Decrypts data encrypted with envelope encryption.
     */
    public String decrypt(String encrypted) {
        try {
            byte[] data = Base64.getDecoder().decode(encrypted);
            ByteBuffer buffer = ByteBuffer.wrap(data);

            // Extract components
            int dekLength = buffer.getInt();
            byte[] encryptedDEK = new byte[dekLength];
            buffer.get(encryptedDEK);

            byte[] iv = new byte[12];
            buffer.get(iv);

            byte[] ciphertext = new byte[buffer.remaining()];
            buffer.get(ciphertext);

            // Decrypt DEK using KMS
            DecryptResponse dekResponse = kmsClient.decrypt(
                DecryptRequest.builder()
                    .ciphertextBlob(SdkBytes.fromByteArray(encryptedDEK))
                    .keyId(masterKeyId)
                    .build()
            );

            // Decrypt data using DEK
            SecretKey secretKey = new SecretKeySpec(
                dekResponse.plaintext().asByteArray(),
                "AES"
            );

            Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
            GCMParameterSpec gcmSpec = new GCMParameterSpec(128, iv);
            cipher.init(Cipher.DECRYPT_MODE, secretKey, gcmSpec);

            byte[] plaintext = cipher.doFinal(ciphertext);
            return new String(plaintext, StandardCharsets.UTF_8);

        } catch (Exception e) {
            throw new EncryptionException("Failed to decrypt data", e);
        }
    }

    private byte[] generateIV() {
        byte[] iv = new byte[12];
        new SecureRandom().nextBytes(iv);
        return iv;
    }
}

// Usage in entity
@Entity
public class Customer {
    @Id
    private UUID id;

    // Automatically encrypted/decrypted
    @Convert(converter = EncryptedStringConverter.class)
    @Column(length = 2000) // Encrypted data is larger than plaintext
    private String socialSecurityNumber;

    @Convert(converter = EncryptedStringConverter.class)
    @Column(length = 2000)
    private String accountNumber;
}

The envelope encryption pattern provides several advantages: the master key never leaves the Hardware Security Module (HSM), each value is encrypted with a unique DEK (preventing pattern analysis), and key rotation only requires re-encrypting the DEKs rather than all data. AES-GCM mode provides both confidentiality and authenticity - it detects if ciphertext has been tampered with. The unique initialization vector (IV) for each encryption operation ensures semantic security: encrypting the same plaintext twice produces different ciphertext.

Encryption in Transit

All data transmitted over networks must use TLS 1.2 or higher:

# Spring Boot application.yml configuration
server:
  ssl:
    enabled: true
    # Only allow TLS 1.2 and 1.3
    protocol: TLS
    enabled-protocols:
      - TLSv1.2
      - TLSv1.3
    # Strong cipher suites only
    ciphers:
      - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
      - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    key-store: classpath:keystore.p12
    key-store-password: ${KEY_STORE_PASSWORD}
    key-store-type: PKCS12
    key-alias: server

# Enforce HTTPS redirects
  http2:
    enabled: true
  # Redirect HTTP to HTTPS
  forward-headers-strategy: native

Configure HSTS (HTTP Strict Transport Security) headers to prevent protocol downgrade attacks:

@Configuration
@EnableWebSecurity
public class SecurityConfig {

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        http
            .requiresChannel(channel -> channel
                .anyRequest().requiresSecure() // Require HTTPS
            )
            .headers(headers -> headers
                .httpStrictTransportSecurity(hsts -> hsts
                    .maxAgeInSeconds(31536000) // 1 year
                    .includeSubDomains(true)
                    .preload(true) // Eligible for browser preload lists
                )
            );

        return http.build();
    }
}

The HSTS header tells browsers to only connect via HTTPS for the specified duration, even if the user types http:// or clicks an HTTP link. The includeSubDomains directive extends this protection to all subdomains. The preload flag makes the site eligible for inclusion in browsers' built-in HSTS preload lists, providing protection even on first visit.

Anonymization and Pseudonymization

Anonymization makes it impossible to identify individuals from data, while pseudonymization replaces identifying fields with pseudonyms. Pseudonymized data is still considered personal data under GDPR, but anonymized data is not.

Anonymization Techniques

True anonymization is difficult to achieve because seemingly non-identifying data can often be re-identified when combined with other datasets. The following techniques reduce re-identification risk:

/**
 * Service that anonymizes customer data for analytics and research.
 *
 * Applies multiple anonymization techniques to reduce re-identification risk
 * while preserving analytical utility.
 */
@Service
public class AnonymizationService {

    /**
     * Anonymizes customer data using k-anonymity and generalization.
     *
     * K-anonymity ensures each record is indistinguishable from at least
     * k-1 other records based on quasi-identifiers.
     */
    public AnonymizedDataset anonymizeForAnalytics(List<Customer> customers, int k) {
        return customers.stream()
            .map(this::anonymizeCustomer)
            .collect(Collectors.groupingBy(this::getQuasiIdentifierGroup))
            .values()
            .stream()
            .filter(group -> group.size() >= k) // Enforce k-anonymity
            .flatMap(List::stream)
            .collect(Collectors.toList());
    }

    private AnonymizedCustomer anonymizeCustomer(Customer customer) {
        return AnonymizedCustomer.builder()
            // Remove direct identifiers completely
            .customerId(null) // No customer ID
            .name(null) // No name
            .email(null) // No email
            .phoneNumber(null) // No phone
            .accountNumber(null) // No account number

            // Generalize quasi-identifiers
            .ageGroup(generalizeAge(customer.getDateOfBirth()))
            .incomeRange(generalizeIncome(customer.getIncome()))
            .region(generalizeLocation(customer.getAddress()))

            // Add noise to numeric data
            .accountBalance(addNoise(customer.getAccountBalance(), 0.05)) // 5% noise

            // Preserve useful analytical attributes
            .accountType(customer.getAccountType())
            .customerSince(generalizeDate(customer.getCreatedAt()))
            .transactionCount(customer.getTransactions().size())

            .build();
    }

    private String generalizeAge(LocalDate dateOfBirth) {
        int age = Period.between(dateOfBirth, LocalDate.now()).getYears();

        // Generalize to age ranges
        if (age < 25) return "18-24";
        if (age < 35) return "25-34";
        if (age < 45) return "35-44";
        if (age < 55) return "45-54";
        if (age < 65) return "55-64";
        return "65+";
    }

    private String generalizeIncome(BigDecimal income) {
        int incomeInt = income.intValue();

        if (incomeInt < 25000) return "$0-$25k";
        if (incomeInt < 50000) return "$25k-$50k";
        if (incomeInt < 75000) return "$50k-$75k";
        if (incomeInt < 100000) return "$75k-$100k";
        if (incomeInt < 150000) return "$100k-$150k";
        return "$150k+";
    }

    private String generalizeLocation(Address address) {
        // Only keep state/region, remove city and street
        return address.getState();
    }

    private String generalizeDate(Instant timestamp) {
        // Round to year and quarter
        LocalDate date = LocalDate.ofInstant(timestamp, ZoneOffset.UTC);
        int quarter = (date.getMonthValue() - 1) / 3 + 1;
        return date.getYear() + " Q" + quarter;
    }

    private BigDecimal addNoise(BigDecimal value, double noiseRatio) {
        // Add random noise using Laplace distribution
        double noise = new Random().nextGaussian() * noiseRatio;
        BigDecimal noiseFactor = BigDecimal.valueOf(1 + noise);
        return value.multiply(noiseFactor).setScale(2, RoundingMode.HALF_UP);
    }

    private String getQuasiIdentifierGroup(AnonymizedCustomer customer) {
        // Group by quasi-identifiers for k-anonymity check
        return customer.getAgeGroup() + "|" +
               customer.getIncomeRange() + "|" +
               customer.getRegion();
    }
}

This anonymization approach combines several techniques: removal of direct identifiers, generalization of quasi-identifiers into ranges, addition of statistical noise, and k-anonymity enforcement. The k-anonymity check ensures that any combination of quasi-identifiers (age group, income range, region) appears for at least k individuals in the dataset, making it difficult to identify a specific person. The noise addition further protects against attacks while preserving statistical properties for analysis.

Pseudonymization

Pseudonymization replaces identifying data with pseudonyms that can be reversed if necessary (e.g., for support or legal requirements):

/**
 * Service that pseudonymizes customer data while maintaining reversibility.
 *
 * Uses cryptographic tokens that can be mapped back to original identifiers
 * with proper authorization. The mapping is stored separately with strict access controls.
 */
@Service
public class PseudonymizationService {

    private final EncryptionService encryptionService;
    private final PseudonymMappingRepository mappingRepository;

    /**
     * Creates a pseudonym for a customer ID.
     *
     * The pseudonym is deterministic for the same input (allowing correlation
     * across datasets) but cannot be reversed without the mapping table.
     */
    public String pseudonymize(UUID customerId) {
        // Check if pseudonym already exists
        Optional<PseudonymMapping> existing = mappingRepository
            .findByCustomerId(customerId);

        if (existing.isPresent()) {
            return existing.get().getPseudonym();
        }

        // Create new pseudonym using HMAC
        String pseudonym = generatePseudonym(customerId);

        // Store mapping in separate, access-controlled table
        PseudonymMapping mapping = new PseudonymMapping();
        mapping.setCustomerId(customerId);
        mapping.setPseudonym(pseudonym);
        mapping.setCreatedAt(Instant.now());
        mappingRepository.save(mapping);

        return pseudonym;
    }

    /**
     * Reverses a pseudonym to original customer ID.
     *
     * Requires special authorization and creates audit log entry.
     */
    public UUID depseudonymize(String pseudonym, User requestingUser) {
        // Check authorization
        if (!hasDepseudonymizationPermission(requestingUser)) {
            throw new AccessDeniedException(
                "User does not have depseudonymization permission"
            );
        }

        PseudonymMapping mapping = mappingRepository
            .findByPseudonym(pseudonym)
            .orElseThrow(() -> new PseudonymNotFoundException(pseudonym));

        // Audit the depseudonymization
        auditService.logDepseudonymization(
            pseudonym,
            mapping.getCustomerId(),
            requestingUser.getId(),
            Instant.now()
        );

        return mapping.getCustomerId();
    }

    private String generatePseudonym(UUID customerId) {
        // Use HMAC-SHA256 with secret key
        // This is deterministic: same input always produces same output
        try {
            Mac hmac = Mac.getInstance("HmacSHA256");
            SecretKey key = getHMACKey(); // Retrieved from secure key storage
            hmac.init(key);

            byte[] hash = hmac.doFinal(customerId.toString().getBytes());

            // Convert to URL-safe base64
            return Base64.getUrlEncoder()
                .withoutPadding()
                .encodeToString(hash);

        } catch (Exception e) {
            throw new PseudonymizationException("Failed to generate pseudonym", e);
        }
    }
}

/**
 * Separate entity for storing pseudonym mappings.
 *
 * This table has strict access controls and is not accessible
 * by normal application code - only by the pseudonymization service.
 */
@Entity
@Table(name = "pseudonym_mappings", schema = "security")
public class PseudonymMapping {
    @Id
    @GeneratedValue
    private Long id;

    @Column(unique = true, nullable = false)
    private UUID customerId;

    @Column(unique = true, nullable = false)
    private String pseudonym;

    private Instant createdAt;
}

Pseudonymization provides a middle ground between full anonymization and identifiable data. The HMAC-based pseudonym generation is deterministic, meaning the same customer ID always produces the same pseudonym. This allows correlating data across different datasets using pseudonyms while protecting the actual identity. The mapping table has strict access controls - most application code cannot access it, only the specialized depseudonymization function. Every depseudonymization is logged for audit purposes, creating accountability for re-identification.

Data Breach Notification Procedures

GDPR requires notification of data breaches to regulators within 72 hours if the breach poses a risk to individuals' rights and freedoms. Organizations must have procedures to detect, assess, and respond to breaches.

Breach Detection

/**
 * Service that monitors for potential data breaches.
 *
 * Integrates with security systems to detect anomalous data access patterns
 * that could indicate unauthorized access or data exfiltration.
 */
@Service
public class BreachDetectionService {

    /**
     * Analyzes PII access patterns to detect anomalies.
     *
     * Triggers alerts for suspicious activity:
     * - Bulk data exports by unusual users
     * - Access to customer data outside normal patterns
     * - Failed authentication followed by successful PII access
     * - Data access from unusual locations or times
     */
    @EventListener
    public void onPIIAccess(PIIAccessEvent event) {
        // Check for anomalous access patterns
        AccessPattern pattern = analyzeAccessPattern(event);

        if (pattern.isAnomalous()) {
            SecurityAlert alert = SecurityAlert.builder()
                .type(AlertType.SUSPICIOUS_PII_ACCESS)
                .userId(event.getUserId())
                .customerId(event.getCustomerId())
                .timestamp(event.getTimestamp())
                .riskScore(pattern.getRiskScore())
                .details(pattern.getAnomalyDetails())
                .build();

            securityAlertService.raiseAlert(alert);

            // If risk is high, immediately investigate
            if (pattern.getRiskScore() > 0.8) {
                incidentResponseService.initiateInvestigation(alert);
            }
        }
    }

    private AccessPattern analyzeAccessPattern(PIIAccessEvent event) {
        User user = userRepository.findById(event.getUserId())
            .orElseThrow();

        // Build user's normal access profile
        List<PIIAccessEvent> historicalAccess = accessEventRepository
            .findByUserIdAndTimestampAfter(
                user.getId(),
                Instant.now().minus(30, ChronoUnit.DAYS)
            );

        AccessProfile profile = buildAccessProfile(historicalAccess);

        // Check for anomalies
        List<Anomaly> anomalies = new ArrayList<>();

        // Volume anomaly: accessing more customers than usual
        if (event.getCustomerCount() > profile.getAverageCustomerAccess() * 3) {
            anomalies.add(new Anomaly(
                AnomalyType.VOLUME,
                "Accessed " + event.getCustomerCount() + " customers, " +
                "average is " + profile.getAverageCustomerAccess()
            ));
        }

        // Time anomaly: accessing data at unusual hours
        if (isOutsideNormalHours(event.getTimestamp(), profile)) {
            anomalies.add(new Anomaly(
                AnomalyType.TIME,
                "Access at " + event.getTimestamp() + " outside normal hours"
            ));
        }

        // Location anomaly: access from unusual location
        if (!profile.getTypicalLocations().contains(event.getIpAddress())) {
            anomalies.add(new Anomaly(
                AnomalyType.LOCATION,
                "Access from unusual IP: " + event.getIpAddress()
            ));
        }

        // Calculate overall risk score
        double riskScore = calculateRiskScore(anomalies);

        return AccessPattern.builder()
            .anomalous(!anomalies.isEmpty())
            .anomalies(anomalies)
            .riskScore(riskScore)
            .build();
    }
}

This detection system learns normal behavior for each user and alerts on deviations. For example, if a customer service representative typically accesses 20-30 customer records per day during business hours, but suddenly accesses 500 records at 2 AM from a new location, this triggers a high-risk alert. The system provides early warning of potential breaches, enabling faster response.

Breach Response Workflow

/**
 * Service managing data breach incident response.
 *
 * Coordinates containment, investigation, notification, and remediation
 * activities following GDPR breach notification requirements.
 */
@Service
public class BreachResponseService {

    /**
     * Initiates breach response procedure.
     *
     * Creates incident record and starts 72-hour countdown for regulatory notification.
     */
    public BreachIncident reportBreach(BreachReport report) {
        BreachIncident incident = BreachIncident.builder()
            .id(UUID.randomUUID())
            .reportedAt(Instant.now())
            .reportedBy(report.getReporterId())
            .description(report.getDescription())
            .status(BreachStatus.DETECTED)
            .regulatoryDeadline(Instant.now().plus(72, ChronoUnit.HOURS))
            .build();

        breachIncidentRepository.save(incident);

        // Immediately notify security team
        notificationService.sendBreachAlert(incident);

        // Start automated containment if possible
        if (report.isAutomatedContainmentPossible()) {
            containBreach(incident);
        }

        return incident;
    }

    /**
     * Assesses breach severity and determines notification requirements.
     */
    public BreachAssessment assessBreach(UUID incidentId) {
        BreachIncident incident = breachIncidentRepository.findById(incidentId)
            .orElseThrow();

        // Assess impact
        ImpactAssessment impact = determineImpact(incident);

        BreachAssessment assessment = BreachAssessment.builder()
            .incidentId(incidentId)
            .dataTypesAffected(impact.getDataTypes())
            .estimatedAffectedIndividuals(impact.getAffectedCount())
            .riskLevel(impact.getRiskLevel())
            .requiresRegulatoryNotification(impact.getRiskLevel() != RiskLevel.LOW)
            .requiresIndividualNotification(impact.getRiskLevel() == RiskLevel.HIGH)
            .assessedAt(Instant.now())
            .build();

        assessmentRepository.save(assessment);

        incident.setStatus(BreachStatus.ASSESSED);
        incident.setAssessment(assessment);
        breachIncidentRepository.save(incident);

        // If regulatory notification required, prepare report
        if (assessment.requiresRegulatoryNotification()) {
            prepareRegulatoryNotification(incident);
        }

        return assessment;
    }

    /**
     * Notifies data protection authority (DPA) of breach.
     *
     * Must be done within 72 hours of becoming aware of the breach.
     */
    public void notifyRegulator(UUID incidentId) {
        BreachIncident incident = breachIncidentRepository.findById(incidentId)
            .orElseThrow();

        if (Instant.now().isAfter(incident.getRegulatoryDeadline())) {
            logger.error("Regulatory deadline missed for incident {}", incidentId);
        }

        RegulatoryNotification notification = RegulatoryNotification.builder()
            .incidentId(incidentId)
            .dataProtectionAuthority(determineJurisdiction(incident))
            .notificationContent(buildRegulatoryReport(incident))
            .sentAt(Instant.now())
            .build();

        // Send to appropriate DPA
        dpaClient.submitBreachNotification(notification);

        incident.setStatus(BreachStatus.REGULATORY_NOTIFIED);
        incident.setRegulatoryNotification(notification);
        breachIncidentRepository.save(incident);
    }

    /**
     * Notifies affected individuals of the breach.
     *
     * Required if breach poses high risk to rights and freedoms.
     */
    public void notifyAffectedIndividuals(UUID incidentId) {
        BreachIncident incident = breachIncidentRepository.findById(incidentId)
            .orElseThrow();

        BreachAssessment assessment = incident.getAssessment();

        if (!assessment.requiresIndividualNotification()) {
            logger.info("Individual notification not required for incident {}", incidentId);
            return;
        }

        List<UUID> affectedCustomers = identifyAffectedCustomers(incident);

        for (UUID customerId : affectedCustomers) {
            Customer customer = customerRepository.findById(customerId)
                .orElseThrow();

            // Send breach notification email
            emailService.sendBreachNotification(
                customer.getEmail(),
                buildCustomerNotification(incident, customer)
            );
        }

        incident.setStatus(BreachStatus.INDIVIDUALS_NOTIFIED);
        incident.setIndividualsNotifiedAt(Instant.now());
        breachIncidentRepository.save(incident);
    }

    private RegulatoryReport buildRegulatoryReport(BreachIncident incident) {
        return RegulatoryReport.builder()
            // Nature of breach
            .description(incident.getDescription())
            .breachType(incident.getType())
            .dateDetected(incident.getReportedAt())

            // Categories and numbers of affected individuals
            .affectedIndividuals(incident.getAssessment().getEstimatedAffectedIndividuals())

            // Categories of personal data affected
            .dataCategories(incident.getAssessment().getDataTypesAffected())

            // Likely consequences
            .consequences(assessConsequences(incident))

            // Measures taken or proposed
            .containmentMeasures(incident.getContainmentActions())
            .remediationPlan(incident.getRemediationPlan())

            // Contact details of DPO
            .dpoContact(getDPOContact())

            .build();
    }
}

This breach response workflow ensures GDPR compliance by tracking the 72-hour deadline, assessing whether notification is required, and coordinating notifications to regulators and affected individuals. The regulatory report includes all required information per GDPR Article 33: nature of the breach, categories and numbers of affected individuals, likely consequences, and remediation measures.

For database encryption and secure storage patterns, see Database Design
For API security and authentication, see Security Overview
For secure logging practices, see Observability - Logging
For secrets management, see Secrets Management
For regulatory audit requirements, see Transaction Ledgers

PII (Personally Identifiable Information) Handling​

Identifying and Classifying PII​

Implementation Patterns​

Access Control for PII​

Data Minimization Principles​

Collection Strategy​

Storage Optimization​

Right to Erasure (GDPR Article 17)​

Deletion Workflow​

Implementation​

Backup Considerations​

Data Portability (GDPR Article 20)​

Export Implementation​

Consent Management​

Consent Model​

Consent Collection​

Consent Withdrawal​

Data Retention Policies​

Retention Policy Framework​

Automated Deletion​

Encryption Requirements​

Encryption at Rest​

Encryption in Transit​

Anonymization and Pseudonymization​

Anonymization Techniques​

Pseudonymization​

Data Breach Notification Procedures​

Breach Detection​

Breach Response Workflow​

Related Guidelines​