All articles

Article

Swift Actor Reentrancy and Race Conditions

You already know the basics. You've written async functions, used await, defined an actor, called actor methods from outside. You may also believe — quietly, the way most of us do at first — that an actor method runs end-to-end without interruption. This article is the page where that belief breaks.

Picture of the author
Mahmoud Albelbeisi
Published on
Swift actor reentrancy cover

A note before section 1

The recurring example in this article is a small image-cache actor. Its job is simple: a single getImage(url:) method checks an in-memory cache, downloads on miss, stores the result, and returns the bytes. We will return to this actor in every section — first to expose the reentrancy bug it hides, then to evolve it through each fix pattern. By the end, the same actor is correct under load. One concrete piece of code, growing safer.


1. Hook — A Bug That Looks Impossible

Here is the actor. Ten lines. It looks correct.

actor ImageCache {
    private var store: [URL: Data] = [:]
    func getImage(url: URL) async throws -> Data {
        if let hit = store[url] { return hit }
        let data = try await download(url: url)   // ← await
        store[url] = data
        return data
    }
}

You ship this. In production, two callers ask for the same URL at the same time. The server log shows two downloads. You expected one. The cache hit ratio is wrong. The download(_:) cost line on the bill is double what it should be.

You stare at the code. It is an actor. Aren't actor methods serialised?

Yes — but await releases the lane. That single sentence is the whole story. The rest of this article is the unpacking of it.

Both callers correctly observed an empty cache. Both kicked off a download. Neither is wrong on its own. The actor's serialisation does not cover the gap across the await.

"actor solves data races. It does not solve all races."


2. What Actor Reentrancy Actually Is

An actor (a reference type whose state is serialised — only one task touches it at a time) serialises access between suspension points. Not across them.

A suspension point (a place inside an async function where it can pause and return the lane to the runtime) is just await. At every await inside an actor method, the actor's lane is released. Other actor methods — including another invocation of the same one — can run while the first one is suspended.

This is actor reentrancy (the property that other work on the same actor can run between two awaits of one method). It is a deliberate design choice, not an accident. Without it:

  • A single slow actor method would block all other callers indefinitely.
  • A method awaiting another actor that wanted to call back would deadlock.

Reentrancy keeps actors responsive and deadlock-free. It also opens the door to logic bugs.

Think of it like… a shared notebook two people write into one at a time. While one writer puts down their pen to think (await), another writer can pick up the pen and add a line. When the first writer reaches for the pen again, the page is not the page they remember.

In software, this looks like… the image-cache method reads store[url] (page) before await download(...) (think). After the await, store[url] may have been written by another caller. The pre-await snapshot is stale.

Don't confuse with… a re-entrant lock lets the same thread re-acquire a held lock. Swift actor reentrancy means unrelated work runs between two awaits of the same method. Different mechanisms, different problems. A re-entrant lock prevents one specific deadlock; actor reentrancy prevents a wider class.

The invariant to internalise:

Anything you read before await can have changed by the time you wake up. If correctness depends on it, re-check.

The actor is never "free for two methods at once." It is sometimes "between two suspended methods, one of which is currently running."


3. The Three Reentrancy Bug Patterns

The same shape appears in many disguises. Three patterns cover almost all production cases.

Pattern A — Duplicate Work (the "two downloads" bug)

// ❌ Has a reentrancy bug — see below
actor ImageCache {
    private var store: [URL: Data] = [:]
    func getImage(url: URL) async throws -> Data {
        if let hit = store[url] { return hit }
        let data = try await download(url: url)
        store[url] = data
        return data
    }
}

Cache miss → start download (await) → write result. Two callers both miss the cache before either writes. Both download. The invariant violated: "at most one download in flight per URL."

Pattern B — Stale State (the TOCTOU bug)

Time-of-check / time-of-use (TOCTOU) (read state, await, write back; another method writes in the gap). The classic balance-updater bug.

// ❌ Has a reentrancy bug — see below
actor Wallet {
    private(set) var balance: Int = 0
    func add(_ amount: Int, slowAudit: () async -> Void) async {
        let current = balance              // read
        await slowAudit()                  // await
        balance = current + amount         // write back the OLD value + amount
    }
}

Two concurrent add(100, ...) calls each read balance == 0, then write 0 + 100. The final balance is 100, not 200. One update vanished. The invariant violated: "every add must be additive."

Pattern C — Out-of-Order Operations

Caller assumes two actor calls made in order will execute in order end-to-end. They start in order, but their awaits interleave, so visible effects can land out of order.

// ❌ Has a reentrancy bug — see below
actor Log {
    private var lines: [String] = []
    func write(_ line: String, slowFlush: () async -> Void) async {
        lines.append("[\(Date().formatted(.iso8601))] \(line)")
        await slowFlush()
    }
}
// Caller:
async let _ = log.write("A", slowFlush: slowA)
async let _ = log.write("B", slowFlush: slowB)

A enters first and appends. A awaits. B enters, appends. B's flush is faster, so B's line is visible on disk before A's.

The invariant violated: "order at the call site equals order on disk."

"Three shapes, one cause: state read before await was assumed unchanged after await."


4. Why Strict Concurrency Doesn't Save You

Swift 6 complete concurrency checking (the strictest mode, on by default in Swift 6) is about data races — concurrent unsynchronised access to memory. The compiler proves the same byte is never written by two threads at once.

Actor reentrancy bugs are logic races (wrong answers due to interleaving, not unsynchronised memory). The access is synchronised — only one method runs inside the actor at a time. The interleaving of operations across awaits is what produces the wrong answer.

The compiler cannot tell the difference between:

  • "I read this before await and I rely on it staying the same after."
  • "I read this before await and I'm fine if it changed."

It assumes the second. Because the second is the correct default — most of the time, code is reentrancy-safe by accident, and the compiler making it the default avoids friction.

Sendable, @MainActor, region-based isolation — none of these address reentrancy. They address a different problem: can this value safely move between domains? That is memory safety. Reentrancy is correctness.

"Strict concurrency is a memory-safety guarantee, not a correctness guarantee. Correctness is on you."

This is the load-bearing insight of the article. We will see it again in § 8.


5. Fix Patterns — How to Make It Right

Five patterns, ordered from simplest to most general. Pick the first one that fits.

Fix 1 — Don't await inside the critical section

If you can do the check and the "I'm starting work" mark atomically — no await between them — the bug disappears.

actor ImageCache {
    private var store: [URL: Data] = [:]
    private var inFlight: Set<URL> = []
    func getImage(url: URL) async throws -> Data {
        if let hit = store[url] { return hit }
        if inFlight.contains(url) {
            // someone else is already working — wait? merge? abort?
            // (this minimal sketch returns an error; Fix 3 shows the merge)
            throw CancellationError()
        }
        inFlight.insert(url)                // mark synchronously
        defer { inFlight.remove(url) }
        let data = try await download(url: url)
        store[url] = data
        return data
    }
}

The check (store[url]), the second check (inFlight.contains(url)), and the mark (inFlight.insert) all happen with no await between them. By the time another caller arrives, the mark is already set.

  • Use it when you can pull the slow operation out of the read-then-write pair.
  • Skip it when the slow operation must inform the write (e.g., the downloaded value is the new state).
  • You pay an extra synchronous data structure (inFlight).

Fix 2 — Re-validate after await

If you must await mid-operation, snapshot critical state before, re-check after. If state changed, restart, abort, or merge.

actor Wallet {
    private(set) var balance: Int = 0
    func add(_ amount: Int, audit: () async -> Void) async {
        let snapshot = balance
        await audit()
        if balance != snapshot {
            // someone else updated; restart, merge, or abort
            balance += amount       // additive merge — correct here
        } else {
            balance = snapshot + amount
        }
    }
}

The simpler form, when the operation is just additive: do the read-modify-write after the await, in one synchronous step.

actor Wallet {
    private(set) var balance: Int = 0
    func add(_ amount: Int, audit: () async -> Void) async {
        await audit()
        balance += amount       // sync; no read-then-write split
    }
}
  • Use it when the operation is associative (addition, set union, max/min).
  • Skip it when the post-await write depends on the pre-await read (you need Fix 1 or Fix 3).
  • You pay the extra read; CPU is cheap.

Fix 3 — In-Flight Task Registry (request deduplication)

The canonical fix for the duplicate-work bug. Store a [Key: Task<Value, Error>] inside the actor. On request: check the cache (sync), check the registry (sync). If neither has it, register a Task and await its result. Other callers find the Task and await the same one.

actor ImageCache {
    private var store: [URL: Data] = [:]
    private var inFlight: [URL: Task<Data, Error>] = [:]

    func getImage(url: URL) async throws -> Data {
        if let hit = store[url] { return hit }                   // sync
        if let task = inFlight[url] { return try await task.value } // sync check, async await
        let task = Task<Data, Error> { try await download(url: url) }
        inFlight[url] = task
        defer { inFlight[url] = nil }                            // runs after the await below
        do {
            let data = try await task.value
            store[url] = data
            return data
        } catch {
            throw error
        }
    }
}

Two callers asking for the same URL at the same time both find the same Task and await it. One download. One write. Both callers get the same bytes.

Real situation: the same shape powers production image libraries like Nuke and Kingfisher on iOS, and is the load-coalescing pattern inside URLSession's shared cache. Naming it makes the pattern recognisable in code review.

A trade-off mini-box:

OptionWhen it fitsWhat it costs
In-flight task registryDuplicate-work is the bugExtra dictionary; defer cleanup
"Just lock the actor harder"Not a thing — Swift actors do not expose a "no reentrancy" mode(n/a)
Eventual consistencyTwo writes are fine because the value is identicalBandwidth, server load
  • Use it when several callers may race on the same async operation and you want exactly one in flight per key.
  • Skip it when each call genuinely produces a different result.
  • You pay the registry dictionary plus the discipline of defer-removing entries.

Fix 4 — Move the operation outside the actor

Sometimes the right answer is "this work doesn't belong in the actor at all." Keep the actor for the small synchronised state. Do the slow work outside.

actor ImageStore {
    private var store: [URL: Data] = [:]
    func read(_ url: URL) -> Data? { store[url] }
    func write(_ url: URL, _ data: Data) { store[url] = data }
}

func fetchImage(url: URL, store: ImageStore) async throws -> Data {
    if let hit = await store.read(url) { return hit }
    let data = try await download(url: url)        // outside the actor
    await store.write(url, data)
    return data
}

The actor only serialises the reads and writes. The download runs outside it. Two callers can still both download (Fix 3 still applies if you care), but the actor itself has no await inside any method, so it cannot be reentrant in a harmful way.

  • Use it when the actor is genuinely only a tiny synchronised store.
  • Skip it when the slow operation must atomically observe and update state.
  • You pay the loss of encapsulation — callers know more about the shape.

Fix 5 — Custom SerialExecutor (vocabulary mention)

A SerialExecutor (a per-actor scheduler you can plug in for advanced cases) lets you control how the actor's work is queued. Almost no app code needs this. If the standard fixes do not cover your case, the documentation for SerialExecutor and TaskExecutor is the next stop.

"Reach for the smallest fix that closes the bug. The registry is the workhorse; the others are scalpels."


6. The Other Side — When Reentrancy Helps You

It is tempting to read this article and conclude "reentrancy is bad; ban await inside actor methods." Don't.

Reentrancy is what keeps actors responsive. Without it, a single long actor method would block every other caller until it finished. A user-facing app would freeze.

Reentrancy also prevents a class of deadlocks. Imagine actor A calls actor B with await, and during B's work, B needs to call back into A. In a non-reentrant world, A still holds its lane (because A is "in the middle of a method"), and B's call into A would wait forever. In Swift's reentrant world, A's lane was released at the await, so B's call goes through.

In a hypothetical non-reentrant model, the second arrow B → A would block forever. Reentrancy is the feature that makes the first arrow safe to write.

The takeaway: do not "fix" reentrancy by avoiding await inside actor methods. Design the state so the operations stay correct with reentrancy. The fixes in § 5 are how.

"Reentrancy is a feature you must design around, not a bug you can disable."


7. How to Detect Reentrancy Bugs in Your Code

A practical checklist.

  1. Read every actor method that contains await. Ask: "If another method ran here, would my logic still hold?" If unsure, the answer is no.
  2. Look for check-then-act with an await between. Cache lookups, "is this in flight?" checks, balance reads, list-membership tests.
  3. Look for monotonic-counter assumptions. If your actor assumes a counter only goes up, an await in the increment path means that is no longer guaranteed.
  4. Tests do not catch these reliably. They reproduce only under contention. Add a stress test: call the actor concurrently 100× and assert the invariant.
  5. Code review heuristic: highlight every await inside an actor member. Each one is a place to ask "what could have changed?".

A concrete stress test for the image-cache fix:

@Test func dedupesConcurrentRequests() async throws {
    let cache = ImageCache()
    actor Counter { var n = 0; func tick() { n += 1 } }
    let counter = Counter()
    func instrumentedDownload(_ u: URL) async throws -> Data {
        await counter.tick(); return Data([0xFF])
    }
    cache.replaceDownloader(with: instrumentedDownload)
    await withTaskGroup(of: Void.self) { g in
        for _ in 0..<50 { g.addTask { _ = try? await cache.getImage(url: testURL) } }
    }
    #expect(await counter.n == 1)
}

50 concurrent calls to the same URL should produce exactly one download.


8. Closing Takeaways

  1. An actor serialises between suspensions, not across them.
  2. Every await inside an actor is a re-validation point. Treat it like a hand-off.
  3. Strict concurrency catches data races. Reentrancy bugs are logic races. The compiler is silent on them — you must reason.
  4. The in-flight task registry is the canonical fix for "duplicate work" reentrancy bugs. Learn it.
  5. Snapshot state before await, re-check after when correctness depends on it.
  6. Reentrancy is not a bug — it is what makes actors responsive. Design with it, not against it.
  7. When in doubt, write a stress test that calls the actor concurrently and asserts the invariant.

The article ends here. Read your own actor methods with one question: "What if another method ran here?"