Kernel Synchronization I: Atomic Operations and Spinlocks

Why This Matters

The Linux kernel runs on multicore hardware and handles interrupts that can fire at almost any moment. Any piece of kernel data that more than one CPU—or a CPU and an interrupt handler—can touch simultaneously is a candidate for a race condition. Understanding synchronization is therefore not optional; a single missed lock can corrupt in-memory file-system state, crash the system, or silently produce wrong results that surface hours later.


1. Background: Why We Have Multiple Cores

Single-core performance improvements hit a wall around 2007:

Bottleneck Why it stopped scaling
Clock frequency Higher frequency → quadratic increase in power consumption
Wire delay A signal can only travel a limited physical distance in one clock cycle
ILP (Instruction-Level Parallelism) Superscalar, out-of-order execution, and branch prediction were already exploited heavily through the 1990s

The industry's answer was multi-core processors: pack more cores on a chip rather than make each core faster. Moore's Law (transistor count doubles ~every 2 years) continued, but the benefit shifted from single-thread speed to throughput across many threads.

For the kernel this means: code that once ran on one CPU with well-defined ordering now runs simultaneously on 4, 8, 64, or more CPUs at the same time.


2. The Synchronization Problem

Critical Section and Race Condition

A critical section (or critical region) is any code path that reads or writes shared data. It must appear atomic to outside observers—no other thread should be able to interleave inside it.

A race condition occurs when two threads of execution enter the same critical section concurrently and the outcome depends on the exact interleaving of their instructions. These bugs are notoriously hard to reproduce.

The Classic i++ Example

Consider:

int i = 7;
void foo(void) { i++; }

i++ looks like one operation in C, but the compiler emits something like:

LOAD  reg, [i]   ; read i from memory
ADD   reg, 1     ; increment
STORE [i], reg   ; write back

If two threads both execute LOAD before either executes STORE, both read 7, both write 8, and the final value is 8 instead of the correct 9. This is a lost update.

Sources of Concurrency in the Kernel

The kernel faces four distinct sources of concurrency:

Source Description
SMP (true concurrency) Two CPUs literally execute code at the same instant
Kernel preemption The scheduler can preempt a task mid-critical-section and run another task on the same CPU
Sleeping / user-space sync A task calls a blocking operation, the scheduler runs another task, which touches the same data
Interrupts / softirqs / tasklets Hardware or software interrupts fire asynchronously and can run while process-context code holds partial state

3. Atomic Operations

For simple shared counters or flags, the kernel provides atomic operations: single machine instructions that read-modify-write memory without any possibility of interleaving.

How It Works on x86

The LOCK XADD instruction (and similar) uses the system bus lock or cache coherency to prevent another CPU from accessing the same cache line for the duration of the instruction:

LOCK XADD DEST, SRC   ; atomic fetch-and-add

Linux Atomic Integer API

Linux wraps atomic integers in the atomic_t (32-bit) and atomic64_t (64-bit) types. Examples:

atomic_t v = ATOMIC_INIT(0);

atomic_inc(&v);            // v++
atomic_dec(&v);            // v--
atomic_add(5, &v);         // v += 5
int x = atomic_read(&v);  // read without data race
atomic_set(&v, 10);        // assign

The kernel also provides bitwise atomic operations (e.g., test_and_set_bit, clear_bit) for flag manipulation within a word.

Limitation of Atomic Operations

Atomic ops work for individual variables. They cannot protect a compound invariant spanning multiple fields (e.g., "the list has N nodes and the count field equals N"). For that you need a lock.


4. Spinlocks

What Is a Spinlock?

A spinlock is a busy-wait lock: a thread that finds the lock already held loops ("spins") in a tight loop polling the lock state until it is released, then acquires it.

spinlock_t lock = SPIN_LOCK_UNLOCKED;

spin_lock(&lock);
/* critical section */
spin_unlock(&lock);

Key properties:

When to Use Spinlocks

Deadlocks

Self-deadlock: a single thread acquires a spinlock it already holds → spins forever.

ABBA deadlock (deadly embrace):

Thread 1: lock(A) → lock(B)
Thread 2: lock(B) → lock(A)

Thread 1 holds A and waits for B; Thread 2 holds B and waits for A. Neither can proceed. Prevention: always acquire locks in a consistent global order.

Spinlocks and Interrupt Handlers

If a spinlock protects data accessed from both process context and an interrupt handler, you must also disable local interrupts while holding the lock:

unsigned long flags;

spin_lock_irqsave(&lock, flags);   // disable IRQs + save state
/* critical section */
spin_unlock_irqrestore(&lock, flags); // restore IRQ state

Without disabling interrupts, this sequence is possible:

  1. Process context acquires lock.
  2. An interrupt fires on the same CPU.
  3. The interrupt handler calls spin_lock() on the same lock.
  4. The handler spins forever—it is waiting for the lock holder, but the lock holder cannot run until the handler returns.

This is a double-acquire deadlock.

When you know interrupts are always disabled at the call site, you can use the simpler (unconditional) spin_lock_irq / spin_unlock_irq pair.


5. Concurrency Safety Levels

Term Meaning
SMP-safe Safe from true parallelism on multiple CPUs
Preemption-safe Safe even if the scheduler preempts the current task mid-section
Interrupt-safe Safe from asynchronous interrupt handlers running on the same CPU

6. What to Protect

Protect data, not code. Ask these questions about any shared object:

  1. Is the data global?
  2. Can another execution context (thread, interrupt handler, softirq) access it?
  3. Is it shared between process context and interrupt context?
  4. Is it shared between two different interrupt handlers?
  5. If the current process is preempted, can the newly scheduled task access the same data?
  6. If the code sleeps, what state is the shared data left in?
  7. What happens if this function runs concurrently on two CPUs?

Key Takeaways

Practice

  1. Why did single-core processor performance scaling stall around 2007?
  2. Two kernel threads concurrently execute i++ where i starts at 7. What is the range of possible final values?
  3. Which of the following is NOT a source of concurrency that the Linux kernel must protect against?
  4. What is the key hardware mechanism that makes atomic_inc safe on x86 multi-core systems?
  5. A kernel developer calls spin_lock(&my_lock) from a function that is already holding my_lock on the same CPU. What happens?
  6. A spinlock dev_lock protects a hardware device's ring buffer. An interrupt handler also acquires dev_lock. Process-context code holds dev_lock when an interrupt fires on the same CPU. What is the result if spin_lock (not spin_lock_irqsave) was used?
  7. Describe the ABBA (deadly embrace) deadlock scenario, give a concrete two-lock example, and explain how to prevent it.
  8. Why is it dangerous to sleep (e.g., call copy_from_user or kmalloc(GFP_KERNEL)) while holding a spinlock?
  9. When should you use spin_lock_irqsave instead of spin_lock_irq?
  10. A senior engineer tells you: 'Just add a spinlock around every function that touches shared data.' Why is this advice wrong, and what should you do instead?