Swift Performance Optimization

Why Performance Matters

Performance optimization ensures applications remain responsive under load, consume minimal battery, and provide smooth user experiences. Swift's value types with copy-on-write, whole module optimization, and protocol specialization enable high performance when used correctly. Understanding memory layout, minimizing allocations, and leveraging lazy evaluation are essential for optimal performance. Profiling with Instruments identifies actual bottlenecks before optimization.

Overview

This guide covers Swift-specific performance optimization techniques including value type performance characteristics, copy-on-write optimization, protocol witness tables and specialization, whole module optimization, memory layouts and alignment, lazy evaluation patterns, collection performance, and profiling with Instruments.

For iOS-specific performance patterns, see our iOS Performance guidelines. For general performance testing, see our Performance Testing guidelines.

Core Principles

Measure First: Profile before optimizing, don't guess
Value Types: Prefer structs for predictable performance
Copy-on-Write: Leverage COW for efficient value semantics
Whole Module: Enable WMO for cross-file optimization
Minimize Allocations: Reduce heap allocations for speed
Protocol Performance: Understand witness table overhead
Lazy Evaluation: Defer expensive work until needed
Collection Choice: Pick the right collection for access patterns
Reference Counting: Minimize retain/release overhead
Instruments: Use profiling tools to find real bottlenecks

Value Type Performance

Value types (structs, enums) generally offer better performance than reference types (classes) because they're stack-allocated, avoid reference counting overhead, and enable compiler optimizations. However, naive use of value types can cause excessive copying.

The key insight: small value types (<16 bytes) are fast to copy. Larger value types benefit from copy-on-write semantics. The Swift standard library uses copy-on-write for Array, Dictionary, Set, and String - they appear to copy but actually share storage until mutation occurs.

Stack vs Heap Allocation

Stack allocation is faster than heap allocation because it's just pointer arithmetic, while heap allocation requires finding free memory, managing metadata, and thread synchronization:

// GOOD: Small structs - stack allocated, fast
struct Point {
    let x: Double // 8 bytes
    let y: Double // 8 bytes
    // Total: 16 bytes - fits in 2 registers, very fast
}

func processPoints(_ points: [Point]) {
    for point in points {
        // Point copied, but it's just 16 bytes - negligible cost
        transform(point)
    }
}

//  LARGE STRUCT: Consider copy-on-write
struct LargeData {
    var values: [Double] // 8 bytes (pointer to heap buffer)
    var metadata: [String: String] // 8 bytes (pointer to heap buffer)
    var timestamp: Date // 16 bytes (struct)
    var id: UUID // 16 bytes (struct)
    // Total: 48 bytes + heap allocations for arrays
}

// GOOD: Use copy-on-write for large value types
struct LargeDataCOW {
    private final class Storage {
        var values: [Double]
        var metadata: [String: String]

        init(values: [Double], metadata: [String: String]) {
            self.values = values
            self.metadata = metadata
        }
    }

    private var storage: Storage
    let timestamp: Date
    let id: UUID

    private mutating func ensureUniqueStorage() {
        if !isKnownUniquelyReferenced(&storage) {
            storage = Storage(values: storage.values, metadata: storage.metadata)
        }
    }

    mutating func addValue(_ value: Double) {
        ensureUniqueStorage()
        storage.values.append(value)
    }
}

Copy-on-Write Optimization

Implement copy-on-write for large value types to get value semantics without copying cost:

// GOOD: Copy-on-write implementation
struct BigBuffer {
    private final class Storage {
        var data: [UInt8]

        init(data: [UInt8]) {
            self.data = data
        }

        func copy() -> Storage {
            return Storage(data: data)
        }
    }

    private var storage: Storage

    init(data: [UInt8]) {
        self.storage = Storage(data: data)
    }

    // Copy is shallow - just copies reference
    // Only actual buffer copying happens on mutation if shared

    private mutating func ensureUnique() {
        if !isKnownUniquelyReferenced(&storage) {
            storage = storage.copy()
        }
    }

    mutating func append(_ byte: UInt8) {
        ensureUnique() // Copy only if shared
        storage.data.append(byte)
    }

    var count: Int {
        return storage.data.count // No copy for read access
    }

    subscript(index: Int) -> UInt8 {
        get {
            return storage.data[index] // No copy for read
        }
        set {
            ensureUnique() // Copy if shared
            storage.data[index] = newValue
        }
    }
}

// Usage demonstrates efficiency
var buffer1 = BigBuffer(data: Array(repeating: 0, count: 1_000_000))
var buffer2 = buffer1 // Fast - just copies reference

// Both reference same underlying storage
print(buffer1.count) // No copy
print(buffer2.count) // No copy

// Now buffer2 needs unique storage
buffer2.append(42) // Triggers copy here

// buffer1 and buffer2 now have separate storage

Protocol Performance

Protocols provide abstraction but have performance implications. When a protocol is used as an existential type (a variable of protocol type), method calls use witness tables - runtime lookups of implementations. This is slower than direct calls or statically dispatched calls.

Generic constraints allow the compiler to specialize code for concrete types, eliminating witness table overhead. Prefer generic constraints over existential types when performance matters.

Existential Types vs Generics

protocol Drawable {
    func draw()
}

struct Circle: Drawable {
    func draw() { /* ... */ }
}

struct Rectangle: Drawable {
    func draw() { /* ... */ }
}

//  SLOWER: Existential type - uses witness table
func drawAll(_ shapes: [Drawable]) {
    for shape in shapes {
        shape.draw() // Indirect call through witness table
    }
}

//  FASTER: Generic with constraint - compiler can specialize
func drawAllGeneric<T: Drawable>(_ shapes: [T]) {
    for shape in shapes {
        shape.draw() // Direct call or inlined
    }
}

//  FASTEST: Concrete type - direct call
func drawAllCircles(_ circles: [Circle]) {
    for circle in circles {
        circle.draw() // Direct method call, can inline
    }
}

// Performance difference:
let circles = Array(repeating: Circle(), count: 10_000)
let rectangles = Array(repeating: Rectangle(), count: 10_000)
let mixed: [Drawable] = circles + rectangles

// Slowest: witness table lookup for each call
drawAll(mixed)

// Faster: specialized for Circle type
drawAllGeneric(circles)

// Fastest: direct calls, likely inlined
drawAllCircles(circles)

Protocol Specialization

The compiler can specialize generic functions for concrete types, eliminating abstraction overhead:

protocol Repository {
    associatedtype Entity
    func save(_ entity: Entity) async throws
    func findAll() async throws -> [Entity]
}

// GOOD: Generic function specialized per type
func processEntities<R: Repository>(_ repository: R, entities: [R.Entity]) async throws {
    for entity in entities {
        try await repository.save(entity)
        // Compiler generates specialized version for each Repository type
        // No protocol overhead
    }
}

// When called with concrete type:
let paymentRepo = PaymentRepository()
let payments: [Payment] = [...]

// Compiler generates optimized version specifically for PaymentRepository
try await processEntities(paymentRepo, entities: payments)

Whole Module Optimization

Whole Module Optimization (WMO) analyzes your entire module together, enabling cross-file optimizations like inlining, devirtualization, and dead code elimination. Without WMO, files are compiled separately, limiting optimization scope.

Enable WMO in Release builds for significant performance improvements (often 2-3x faster). The trade-off is longer compilation times, so typically only use WMO for release builds.

Enabling WMO

In Xcode build settings:

Swift Compiler - Code Generation
Compilation Mode: Whole Module
Use for Release builds, not Debug (Debug uses "Incremental" for fast rebuilds)

Inlining and Specialization

// File1.swift
struct PaymentValidator {
    @inline(__always)  // Force inline for critical path
    func isValid(amount: Decimal) -> Bool {
        return amount > 0 && amount < 1_000_000
    }

    @inline(never)  // Prevent inlining (for debugging or code size)
    func complexValidation(_ payment: Payment) -> Bool {
        // Complex logic that shouldn't be inlined
        return true
    }
}

// File2.swift
func processPayment(_ payment: Payment) -> Bool {
    let validator = PaymentValidator()

    // With WMO, compiler can inline isValid() across files
    guard validator.isValid(amount: payment.amount) else {
        return false
    }

    // complexValidation marked @inline(never), won't be inlined
    return validator.complexValidation(payment)
}

Access Control for Optimization

Using private and fileprivate helps the compiler optimize by limiting visibility:

// GOOD: Private enables optimization
final class PaymentCache {
    private var storage: [String: Payment] = [:]

    // Compiler knows storage is only accessed from this class
    // Can optimize access, eliminate bounds checks, etc.
    func get(_ id: String) -> Payment? {
        return storage[id]
    }
}

//  LESS OPTIMAL: Internal limits optimization
final class PaymentCache {
    internal var storage: [String: Payment] = [:]
    // Compiler must assume other files might access storage
}

Memory Layout and Alignment

Understanding memory layout helps optimize for cache efficiency and reduce memory footprint. Swift uses automatic memory layout, but you can influence it with property ordering and strategic use of enums.

Struct Layout

Swift lays out struct properties in declaration order but may add padding for alignment:

//  SUBOPTIMAL: 24 bytes due to padding
struct Payment {
    let id: String      // 16 bytes (String is 16 bytes)
    let isProcessed: Bool // 1 byte
    // 7 bytes padding for alignment
    let amount: Decimal   // 16 bytes (actually larger, but example)
}
// Total: ~40 bytes with padding

//  BETTER: 24 bytes with less padding
struct PaymentOptimized {
    let id: String      // 16 bytes
    let amount: Decimal   // 16 bytes
    let isProcessed: Bool // 1 byte
    // Only 7 bytes padding at end (not between large fields)
}
// Total: ~33 bytes with padding

// Rule: Place larger types first, smaller types last

Enum with Associated Values

Enums with associated values use the size of their largest case plus tag:

// Memory size = max(all cases) + tag byte

enum PaymentResult {
    case success(transactionId: String) // 16 bytes
    case failure(error: Error)           // 8 bytes (Error is protocol, 8-byte existential container on 64-bit)
    case pending                         // 0 bytes
}
// Size: 16 bytes (success case) + 1 byte tag = 17 bytes (+ padding)

//  OPTIMIZATION: Avoid large inline associated values
enum PaymentResultOptimized {
    case success(transactionId: String) // 16 bytes
    case failure(Error)                 // 8 bytes
    case pending                        // 0 bytes
}

//  LARGE ENUM: Consider boxing large cases
enum PaymentResultBoxed {
    case success(Details)   // 8 bytes (pointer)
    case failure(Error)     // 8 bytes
    case pending            // 0 bytes

    struct Details {
        let transactionId: String
        let timestamp: Date
        let metadata: [String: String]
    }
}
// Size: 8 bytes + tag = 9 bytes (+ padding)
// Large Details only allocated for success case

Lazy Evaluation

Lazy properties and lazy sequences defer computation until needed, improving perceived performance and avoiding wasted work:

Lazy Properties

// GOOD: Lazy property for expensive computation
class PaymentReport {
    let payments: [Payment]

    init(payments: [Payment]) {
        self.payments = payments
        // statistics NOT computed yet
    }

    lazy var statistics: PaymentStatistics = {
        // Expensive computation only when accessed
        return calculateStatistics(payments)
    }()

    lazy var formattedReport: String = {
        return generateReport(payments, statistics: statistics)
    }()
}

// Usage
let report = PaymentReport(payments: payments)
// Fast - no statistics calculated

// Later, if statistics needed:
print(report.statistics.average) // Computed now, cached for future use

// If statistics never accessed, computation never happens

Lazy Sequences

// GOOD: Lazy sequence for large datasets
let payments: [Payment] = // ... millions of payments

//  EAGER: Processes all payments even if you only need first 10
let largeAmounts = payments
    .filter { $0.amount > 10_000 }
    .map { $0.amount }
    .sorted()
// All operations execute on entire array

//  LAZY: Processes only what's needed
let largeAmountsLazy = payments
    .lazy
    .filter { $0.amount > 10_000 }
    .map { $0.amount }
// Operations don't execute yet

let topTen = largeAmountsLazy
    .sorted()
    .prefix(10)
// Only processes until 10 items found, not entire array

// GOOD: Lazy for pipelines where not all data is needed
func findFirstMatch(_ payments: [Payment]) -> Payment? {
    return payments
        .lazy
        .filter { $0.status == .pending }
        .map { validatePayment($0) }
        .first { $0.isValid }
    // Stops processing as soon as first valid payment found
}

Collection Performance

Different collections have different performance characteristics. Choose based on your access patterns:

Collection Performance Characteristics

// Array: O(1) indexed access, O(n) search, O(1) append (amortized), O(n) insert/remove
let payments: [Payment] = [...]
let first = payments[0]          // O(1)
let found = payments.first { $0.id == "123" }  // O(n)
payments.append(newPayment)      // O(1) amortized

// Set: O(1) membership test, O(1) insert/remove, no ordering
var processedIds: Set<String> = []
let isProcessed = processedIds.contains("123")  // O(1)
processedIds.insert("123")       // O(1)

// Dictionary: O(1) key lookup, O(1) insert/remove by key
var paymentCache: [String: Payment] = [:]
let payment = paymentCache["123"] // O(1)
paymentCache["123"] = newPayment  // O(1)

// GOOD: Choose right collection
func processPayments(_ payments: [Payment]) {
    // Need fast duplicate detection - use Set
    var processedIds = Set<String>()

    for payment in payments {
        if processedIds.contains(payment.id) {
            continue // O(1) check
        }

        process(payment)
        processedIds.insert(payment.id) // O(1) insert
    }
}

// BAD: Wrong collection for task
func processPaymentsSlow(_ payments: [Payment]) {
    var processedIds: [String] = [] // Array

    for payment in payments {
        if processedIds.contains(payment.id) { // O(n) - slow!
            continue
        }

        process(payment)
        processedIds.append(payment.id)
    }
}

ContiguousArray for Performance

When you don't need bridging to Objective-C, ContiguousArray guarantees contiguous storage:

// Array<T> where T is a class may use NSArray bridging
let payments: [Payment] = [...]

// ContiguousArray<T> always uses native storage - slightly faster
let paymentsContiguous: ContiguousArray<Payment> = [...]

// Use ContiguousArray when:
// 1. Elements are structs or non-ObjC classes
// 2. No need to bridge to NSArray
// 3. Micro-optimization matters (difference is small)

Reducing Reference Counting Overhead

Reference counting has performance cost - each reference increment/decrement is an atomic operation. Minimize by using value types where possible and being mindful of retain cycles.

Unowned References for Performance

When you know an object will outlive a reference, unowned avoids reference counting:

// GOOD: Unowned avoids reference counting
class PaymentProcessor {
    let configuration: Configuration // Owned

    lazy var validator: PaymentValidator = {
        PaymentValidator(configuration: self.configuration)
    }()
}

class PaymentValidator {
    unowned let configuration: Configuration

    init(configuration: Configuration) {
        self.configuration = configuration
    }
}

// No reference count increment/decrement when accessing configuration
// Faster than weak (which is optional and checks validity)
// Safe because PaymentProcessor owns configuration, outlives validator

Minimize Closure Captures

Closures capture references, increasing retain count. Minimize captures:

//  CAPTURES TOO MUCH: Captures entire self
class PaymentService {
    var payments: [Payment] = []

    func processAll() {
        DispatchQueue.global().async {
            self.payments.forEach { payment in
                // Captures self, keeps entire object alive
                self.process(payment)
            }
        }
    }
}

//  BETTER: Capture only what's needed
class PaymentService {
    var payments: [Payment] = []

    func processAll() {
        let paymentsToProcess = self.payments // Copy array

        DispatchQueue.global().async {
            paymentsToProcess.forEach { payment in
                // Only captures array, not entire self
                process(payment)
            }
        }
    }

    private func process(_ payment: Payment) {
        // Processing logic
    }
}

Profiling with Instruments

Always profile before optimizing. Instruments provides detailed performance analysis:

Time Profiler

Identifies CPU-intensive functions:

Product → Profile (⌘I) in Xcode
Select "Time Profiler"
Record while using app
Analyze call tree - find hot paths
Optimize functions consuming most CPU time

// Example: Time Profiler shows this function uses 40% of CPU
func processPayments(_ payments: [Payment]) {
    for payment in payments {
        // Time Profiler reveals this validation is slow
        let isValid = validator.complexValidation(payment)

        if isValid {
            process(payment)
        }
    }
}

// Optimization: Cache validation results
func processPaymentsOptimized(_ payments: [Payment]) {
    var validationCache: [String: Bool] = [:]

    for payment in payments {
        let isValid = validationCache[payment.id] ?? {
            let result = validator.complexValidation(payment)
            validationCache[payment.id] = result
            return result
        }()

        if isValid {
            process(payment)
        }
    }
}

Allocations Instrument

Tracks memory allocations to find excessive allocation:

Select "Allocations" instrument
Record while using app
Look for:
- Excessive allocations (high count)
- Large allocations (high size)
- Leaked memory (not freed)

Memory Graph Debugger

Find retain cycles and memory leaks:

Run app in Debug mode
Debug → View Memory Graph
Look for cycles in object graph
Fix retain cycles with weak/unowned references

Summary

Key Takeaways

Measure first - Profile before optimizing
Value types - Generally faster than reference types
Copy-on-write - Efficient large value types
Generics - Faster than existential types
Whole Module Optimization - Enable for release builds
Memory layout - Order properties by size
Lazy evaluation - Defer expensive work
Right collection - Choose based on access patterns
Minimize allocations - Reuse, avoid temporary objects
Instruments - Use profiling tools

Next Steps: Review iOS Performance for platform-specific optimization and Performance Testing for measuring improvements.

Swift Performance Optimization

Overview

Core Principles

Value Type Performance

Stack vs Heap Allocation

Copy-on-Write Optimization

Protocol Performance

Existential Types vs Generics

Protocol Specialization

Whole Module Optimization

Enabling WMO

Inlining and Specialization

Access Control for Optimization

Memory Layout and Alignment

Struct Layout

Enum with Associated Values

Lazy Evaluation

Lazy Properties

Lazy Sequences

Collection Performance

Collection Performance Characteristics

ContiguousArray for Performance

Reducing Reference Counting Overhead

Unowned References for Performance

Minimize Closure Captures

Profiling with Instruments

Time Profiler

Allocations Instrument

Memory Graph Debugger

Further Reading

General Performance Concepts

Internal Documentation

External Resources

Summary

Key Takeaways

Overview​

Core Principles​

Value Type Performance​

Stack vs Heap Allocation​

Copy-on-Write Optimization​

Protocol Performance​

Existential Types vs Generics​

Protocol Specialization​

Whole Module Optimization​

Enabling WMO​

Inlining and Specialization​

Access Control for Optimization​

Memory Layout and Alignment​

Struct Layout​

Enum with Associated Values​

Lazy Evaluation​

Lazy Properties​

Lazy Sequences​

Collection Performance​

Collection Performance Characteristics​

ContiguousArray for Performance​

Reducing Reference Counting Overhead​

Unowned References for Performance​

Minimize Closure Captures​

Profiling with Instruments​

Time Profiler​

Allocations Instrument​

Memory Graph Debugger​

Further Reading​

General Performance Concepts​

Internal Documentation​

External Resources​

Summary​

Key Takeaways​

Overview

Core Principles

Value Type Performance

Stack vs Heap Allocation

Copy-on-Write Optimization

Protocol Performance

Existential Types vs Generics

Protocol Specialization

Whole Module Optimization

Enabling WMO

Inlining and Specialization

Access Control for Optimization

Memory Layout and Alignment

Struct Layout

Enum with Associated Values

Lazy Evaluation

Lazy Properties

Lazy Sequences

Collection Performance

Collection Performance Characteristics

ContiguousArray for Performance

Reducing Reference Counting Overhead

Unowned References for Performance

Minimize Closure Captures

Profiling with Instruments

Time Profiler

Allocations Instrument

Memory Graph Debugger

Further Reading

General Performance Concepts

Internal Documentation

External Resources

Summary

Key Takeaways