Swift Performance Optimization
Performance optimization ensures applications remain responsive under load, consume minimal battery, and provide smooth user experiences. Swift's value types with copy-on-write, whole module optimization, and protocol specialization enable high performance when used correctly. Understanding memory layout, minimizing allocations, and leveraging lazy evaluation are essential for optimal performance. Profiling with Instruments identifies actual bottlenecks before optimization.
Overview
This guide covers Swift-specific performance optimization techniques including value type performance characteristics, copy-on-write optimization, protocol witness tables and specialization, whole module optimization, memory layouts and alignment, lazy evaluation patterns, collection performance, and profiling with Instruments.
For iOS-specific performance patterns, see our iOS Performance guidelines. For general performance testing, see our Performance Testing guidelines.
Core Principles
- Measure First: Profile before optimizing, don't guess
- Value Types: Prefer structs for predictable performance
- Copy-on-Write: Leverage COW for efficient value semantics
- Whole Module: Enable WMO for cross-file optimization
- Minimize Allocations: Reduce heap allocations for speed
- Protocol Performance: Understand witness table overhead
- Lazy Evaluation: Defer expensive work until needed
- Collection Choice: Pick the right collection for access patterns
- Reference Counting: Minimize retain/release overhead
- Instruments: Use profiling tools to find real bottlenecks
Value Type Performance
Value types (structs, enums) generally offer better performance than reference types (classes) because they're stack-allocated, avoid reference counting overhead, and enable compiler optimizations. However, naive use of value types can cause excessive copying.
The key insight: small value types (<16 bytes) are fast to copy. Larger value types benefit from copy-on-write semantics. The Swift standard library uses copy-on-write for Array, Dictionary, Set, and String - they appear to copy but actually share storage until mutation occurs.
Stack vs Heap Allocation
Stack allocation is faster than heap allocation because it's just pointer arithmetic, while heap allocation requires finding free memory, managing metadata, and thread synchronization:
// GOOD: Small structs - stack allocated, fast
struct Point {
let x: Double // 8 bytes
let y: Double // 8 bytes
// Total: 16 bytes - fits in 2 registers, very fast
}
func processPoints(_ points: [Point]) {
for point in points {
// Point copied, but it's just 16 bytes - negligible cost
transform(point)
}
}
// LARGE STRUCT: Consider copy-on-write
struct LargeData {
var values: [Double] // 8 bytes (pointer to heap buffer)
var metadata: [String: String] // 8 bytes (pointer to heap buffer)
var timestamp: Date // 16 bytes (struct)
var id: UUID // 16 bytes (struct)
// Total: 48 bytes + heap allocations for arrays
}
// GOOD: Use copy-on-write for large value types
struct LargeDataCOW {
private final class Storage {
var values: [Double]
var metadata: [String: String]
init(values: [Double], metadata: [String: String]) {
self.values = values
self.metadata = metadata
}
}
private var storage: Storage
let timestamp: Date
let id: UUID
private mutating func ensureUniqueStorage() {
if !isKnownUniquelyReferenced(&storage) {
storage = Storage(values: storage.values, metadata: storage.metadata)
}
}
mutating func addValue(_ value: Double) {
ensureUniqueStorage()
storage.values.append(value)
}
}
Copy-on-Write Optimization
Implement copy-on-write for large value types to get value semantics without copying cost:
// GOOD: Copy-on-write implementation
struct BigBuffer {
private final class Storage {
var data: [UInt8]
init(data: [UInt8]) {
self.data = data
}
func copy() -> Storage {
return Storage(data: data)
}
}
private var storage: Storage
init(data: [UInt8]) {
self.storage = Storage(data: data)
}
// Copy is shallow - just copies reference
// Only actual buffer copying happens on mutation if shared
private mutating func ensureUnique() {
if !isKnownUniquelyReferenced(&storage) {
storage = storage.copy()
}
}
mutating func append(_ byte: UInt8) {
ensureUnique() // Copy only if shared
storage.data.append(byte)
}
var count: Int {
return storage.data.count // No copy for read access
}
subscript(index: Int) -> UInt8 {
get {
return storage.data[index] // No copy for read
}
set {
ensureUnique() // Copy if shared
storage.data[index] = newValue
}
}
}
// Usage demonstrates efficiency
var buffer1 = BigBuffer(data: Array(repeating: 0, count: 1_000_000))
var buffer2 = buffer1 // Fast - just copies reference
// Both reference same underlying storage
print(buffer1.count) // No copy
print(buffer2.count) // No copy
// Now buffer2 needs unique storage
buffer2.append(42) // Triggers copy here
// buffer1 and buffer2 now have separate storage
Protocol Performance
Protocols provide abstraction but have performance implications. When a protocol is used as an existential type (a variable of protocol type), method calls use witness tables - runtime lookups of implementations. This is slower than direct calls or statically dispatched calls.
Generic constraints allow the compiler to specialize code for concrete types, eliminating witness table overhead. Prefer generic constraints over existential types when performance matters.
Existential Types vs Generics
protocol Drawable {
func draw()
}
struct Circle: Drawable {
func draw() { /* ... */ }
}
struct Rectangle: Drawable {
func draw() { /* ... */ }
}
// SLOWER: Existential type - uses witness table
func drawAll(_ shapes: [Drawable]) {
for shape in shapes {
shape.draw() // Indirect call through witness table
}
}
// FASTER: Generic with constraint - compiler can specialize
func drawAllGeneric<T: Drawable>(_ shapes: [T]) {
for shape in shapes {
shape.draw() // Direct call or inlined
}
}
// FASTEST: Concrete type - direct call
func drawAllCircles(_ circles: [Circle]) {
for circle in circles {
circle.draw() // Direct method call, can inline
}
}
// Performance difference:
let circles = Array(repeating: Circle(), count: 10_000)
let rectangles = Array(repeating: Rectangle(), count: 10_000)
let mixed: [Drawable] = circles + rectangles
// Slowest: witness table lookup for each call
drawAll(mixed)
// Faster: specialized for Circle type
drawAllGeneric(circles)
// Fastest: direct calls, likely inlined
drawAllCircles(circles)
Protocol Specialization
The compiler can specialize generic functions for concrete types, eliminating abstraction overhead:
protocol Repository {
associatedtype Entity
func save(_ entity: Entity) async throws
func findAll() async throws -> [Entity]
}
// GOOD: Generic function specialized per type
func processEntities<R: Repository>(_ repository: R, entities: [R.Entity]) async throws {
for entity in entities {
try await repository.save(entity)
// Compiler generates specialized version for each Repository type
// No protocol overhead
}
}
// When called with concrete type:
let paymentRepo = PaymentRepository()
let payments: [Payment] = [...]
// Compiler generates optimized version specifically for PaymentRepository
try await processEntities(paymentRepo, entities: payments)
Whole Module Optimization
Whole Module Optimization (WMO) analyzes your entire module together, enabling cross-file optimizations like inlining, devirtualization, and dead code elimination. Without WMO, files are compiled separately, limiting optimization scope.
Enable WMO in Release builds for significant performance improvements (often 2-3x faster). The trade-off is longer compilation times, so typically only use WMO for release builds.
Enabling WMO
In Xcode build settings:
- Swift Compiler - Code Generation
- Compilation Mode: Whole Module
- Use for Release builds, not Debug (Debug uses "Incremental" for fast rebuilds)
Inlining and Specialization
// File1.swift
struct PaymentValidator {
@inline(__always) // Force inline for critical path
func isValid(amount: Decimal) -> Bool {
return amount > 0 && amount < 1_000_000
}
@inline(never) // Prevent inlining (for debugging or code size)
func complexValidation(_ payment: Payment) -> Bool {
// Complex logic that shouldn't be inlined
return true
}
}
// File2.swift
func processPayment(_ payment: Payment) -> Bool {
let validator = PaymentValidator()
// With WMO, compiler can inline isValid() across files
guard validator.isValid(amount: payment.amount) else {
return false
}
// complexValidation marked @inline(never), won't be inlined
return validator.complexValidation(payment)
}
Access Control for Optimization
Using private and fileprivate helps the compiler optimize by limiting visibility:
// GOOD: Private enables optimization
final class PaymentCache {
private var storage: [String: Payment] = [:]
// Compiler knows storage is only accessed from this class
// Can optimize access, eliminate bounds checks, etc.
func get(_ id: String) -> Payment? {
return storage[id]
}
}
// LESS OPTIMAL: Internal limits optimization
final class PaymentCache {
internal var storage: [String: Payment] = [:]
// Compiler must assume other files might access storage
}
Memory Layout and Alignment
Understanding memory layout helps optimize for cache efficiency and reduce memory footprint. Swift uses automatic memory layout, but you can influence it with property ordering and strategic use of enums.
Struct Layout
Swift lays out struct properties in declaration order but may add padding for alignment:
// SUBOPTIMAL: 24 bytes due to padding
struct Payment {
let id: String // 16 bytes (String is 16 bytes)
let isProcessed: Bool // 1 byte
// 7 bytes padding for alignment
let amount: Decimal // 16 bytes (actually larger, but example)
}
// Total: ~40 bytes with padding
// BETTER: 24 bytes with less padding
struct PaymentOptimized {
let id: String // 16 bytes
let amount: Decimal // 16 bytes
let isProcessed: Bool // 1 byte
// Only 7 bytes padding at end (not between large fields)
}
// Total: ~33 bytes with padding
// Rule: Place larger types first, smaller types last
Enum with Associated Values
Enums with associated values use the size of their largest case plus tag:
// Memory size = max(all cases) + tag byte
enum PaymentResult {
case success(transactionId: String) // 16 bytes
case failure(error: Error) // 8 bytes (Error is protocol, 8-byte existential container on 64-bit)
case pending // 0 bytes
}
// Size: 16 bytes (success case) + 1 byte tag = 17 bytes (+ padding)
// OPTIMIZATION: Avoid large inline associated values
enum PaymentResultOptimized {
case success(transactionId: String) // 16 bytes
case failure(Error) // 8 bytes
case pending // 0 bytes
}
// LARGE ENUM: Consider boxing large cases
enum PaymentResultBoxed {
case success(Details) // 8 bytes (pointer)
case failure(Error) // 8 bytes
case pending // 0 bytes
struct Details {
let transactionId: String
let timestamp: Date
let metadata: [String: String]
}
}
// Size: 8 bytes + tag = 9 bytes (+ padding)
// Large Details only allocated for success case
Lazy Evaluation
Lazy properties and lazy sequences defer computation until needed, improving perceived performance and avoiding wasted work:
Lazy Properties
// GOOD: Lazy property for expensive computation
class PaymentReport {
let payments: [Payment]
init(payments: [Payment]) {
self.payments = payments
// statistics NOT computed yet
}
lazy var statistics: PaymentStatistics = {
// Expensive computation only when accessed
return calculateStatistics(payments)
}()
lazy var formattedReport: String = {
return generateReport(payments, statistics: statistics)
}()
}
// Usage
let report = PaymentReport(payments: payments)
// Fast - no statistics calculated
// Later, if statistics needed:
print(report.statistics.average) // Computed now, cached for future use
// If statistics never accessed, computation never happens
Lazy Sequences
// GOOD: Lazy sequence for large datasets
let payments: [Payment] = // ... millions of payments
// EAGER: Processes all payments even if you only need first 10
let largeAmounts = payments
.filter { $0.amount > 10_000 }
.map { $0.amount }
.sorted()
// All operations execute on entire array
// LAZY: Processes only what's needed
let largeAmountsLazy = payments
.lazy
.filter { $0.amount > 10_000 }
.map { $0.amount }
// Operations don't execute yet
let topTen = largeAmountsLazy
.sorted()
.prefix(10)
// Only processes until 10 items found, not entire array
// GOOD: Lazy for pipelines where not all data is needed
func findFirstMatch(_ payments: [Payment]) -> Payment? {
return payments
.lazy
.filter { $0.status == .pending }
.map { validatePayment($0) }
.first { $0.isValid }
// Stops processing as soon as first valid payment found
}
Collection Performance
Different collections have different performance characteristics. Choose based on your access patterns:
Collection Performance Characteristics
// Array: O(1) indexed access, O(n) search, O(1) append (amortized), O(n) insert/remove
let payments: [Payment] = [...]
let first = payments[0] // O(1)
let found = payments.first { $0.id == "123" } // O(n)
payments.append(newPayment) // O(1) amortized
// Set: O(1) membership test, O(1) insert/remove, no ordering
var processedIds: Set<String> = []
let isProcessed = processedIds.contains("123") // O(1)
processedIds.insert("123") // O(1)
// Dictionary: O(1) key lookup, O(1) insert/remove by key
var paymentCache: [String: Payment] = [:]
let payment = paymentCache["123"] // O(1)
paymentCache["123"] = newPayment // O(1)
// GOOD: Choose right collection
func processPayments(_ payments: [Payment]) {
// Need fast duplicate detection - use Set
var processedIds = Set<String>()
for payment in payments {
if processedIds.contains(payment.id) {
continue // O(1) check
}
process(payment)
processedIds.insert(payment.id) // O(1) insert
}
}
// BAD: Wrong collection for task
func processPaymentsSlow(_ payments: [Payment]) {
var processedIds: [String] = [] // Array
for payment in payments {
if processedIds.contains(payment.id) { // O(n) - slow!
continue
}
process(payment)
processedIds.append(payment.id)
}
}
ContiguousArray for Performance
When you don't need bridging to Objective-C, ContiguousArray guarantees contiguous storage:
// Array<T> where T is a class may use NSArray bridging
let payments: [Payment] = [...]
// ContiguousArray<T> always uses native storage - slightly faster
let paymentsContiguous: ContiguousArray<Payment> = [...]
// Use ContiguousArray when:
// 1. Elements are structs or non-ObjC classes
// 2. No need to bridge to NSArray
// 3. Micro-optimization matters (difference is small)
Reducing Reference Counting Overhead
Reference counting has performance cost - each reference increment/decrement is an atomic operation. Minimize by using value types where possible and being mindful of retain cycles.
Unowned References for Performance
When you know an object will outlive a reference, unowned avoids reference counting:
// GOOD: Unowned avoids reference counting
class PaymentProcessor {
let configuration: Configuration // Owned
lazy var validator: PaymentValidator = {
PaymentValidator(configuration: self.configuration)
}()
}
class PaymentValidator {
unowned let configuration: Configuration
init(configuration: Configuration) {
self.configuration = configuration
}
}
// No reference count increment/decrement when accessing configuration
// Faster than weak (which is optional and checks validity)
// Safe because PaymentProcessor owns configuration, outlives validator
Minimize Closure Captures
Closures capture references, increasing retain count. Minimize captures:
// CAPTURES TOO MUCH: Captures entire self
class PaymentService {
var payments: [Payment] = []
func processAll() {
DispatchQueue.global().async {
self.payments.forEach { payment in
// Captures self, keeps entire object alive
self.process(payment)
}
}
}
}
// BETTER: Capture only what's needed
class PaymentService {
var payments: [Payment] = []
func processAll() {
let paymentsToProcess = self.payments // Copy array
DispatchQueue.global().async {
paymentsToProcess.forEach { payment in
// Only captures array, not entire self
process(payment)
}
}
}
private func process(_ payment: Payment) {
// Processing logic
}
}
Profiling with Instruments
Always profile before optimizing. Instruments provides detailed performance analysis:
Time Profiler
Identifies CPU-intensive functions:
- Product → Profile (⌘I) in Xcode
- Select "Time Profiler"
- Record while using app
- Analyze call tree - find hot paths
- Optimize functions consuming most CPU time
// Example: Time Profiler shows this function uses 40% of CPU
func processPayments(_ payments: [Payment]) {
for payment in payments {
// Time Profiler reveals this validation is slow
let isValid = validator.complexValidation(payment)
if isValid {
process(payment)
}
}
}
// Optimization: Cache validation results
func processPaymentsOptimized(_ payments: [Payment]) {
var validationCache: [String: Bool] = [:]
for payment in payments {
let isValid = validationCache[payment.id] ?? {
let result = validator.complexValidation(payment)
validationCache[payment.id] = result
return result
}()
if isValid {
process(payment)
}
}
}
Allocations Instrument
Tracks memory allocations to find excessive allocation:
- Select "Allocations" instrument
- Record while using app
- Look for:
- Excessive allocations (high count)
- Large allocations (high size)
- Leaked memory (not freed)
Memory Graph Debugger
Find retain cycles and memory leaks:
- Run app in Debug mode
- Debug → View Memory Graph
- Look for cycles in object graph
- Fix retain cycles with weak/unowned references
Further Reading
General Performance Concepts
- Performance Overview - Performance strategy and principles
- Performance Optimization - Cross-language optimization techniques
- Performance Testing - Load testing strategies
Internal Documentation
- Swift General - Value types and memory management
- Swift Concurrency - Concurrent performance patterns
- iOS Performance - iOS-specific optimizations
External Resources
- Swift Performance Documentation
- WWDC: Understanding Swift Performance
- Instruments Help
- Swift Memory Layout
Summary
Key Takeaways
- Measure first - Profile before optimizing
- Value types - Generally faster than reference types
- Copy-on-write - Efficient large value types
- Generics - Faster than existential types
- Whole Module Optimization - Enable for release builds
- Memory layout - Order properties by size
- Lazy evaluation - Defer expensive work
- Right collection - Choose based on access patterns
- Minimize allocations - Reuse, avoid temporary objects
- Instruments - Use profiling tools
Next Steps: Review iOS Performance for platform-specific optimization and Performance Testing for measuring improvements.