Kernel Synchronization III: Seqlocks, Barriers, and RCU

Why this matters

By now you know the core primitives—spinlocks, mutexes, semaphores, and rwlocks. This module covers the tools the kernel uses when those primitives are either too slow or the wrong semantic fit. Seqlocks give writers priority without starving them. Memory barriers let you reason about hardware and compiler reordering. And RCU is the crown jewel: a lock-free read path used throughout the kernel for read-mostly data structures. Understanding these will help you both write efficient kernel code and reason about why the kernel is designed the way it is.


Completion Variables

A completion variable solves a simple but common problem: one kernel thread needs to wait until another thread (or interrupt handler) signals that something has happened. Semaphores can do this, but completion variables are the idiomatic way in Linux.

API Meaning
DECLARE_COMPLETION(x) Statically declare and initialize
init_completion(&x) Dynamic initialization
wait_for_completion(&x) Block until signaled
complete(&x) Wake up one waiter
complete_all(&x) Wake up all waiters

Think of a completion as a one-shot gate: the waiter sleeps at the gate; the producer opens it exactly once (or for all waiters with complete_all).


Sequential Lock (Seqlock)

The problem with rwlock

A standard reader-writer lock (rwlock) is reader-preferred: as long as readers hold the lock, a writer must wait. Under continuous read traffic, a writer can starve indefinitely. Seqlocks flip this preference.

How seqlock works

A seqlock maintains an integer sequence counter initialized to zero.

/* Reader side */
unsigned seq;
do {
    seq = read_seqbegin(&lock);
    /* read shared data */
} while (read_seqretry(&lock, seq));

/* Writer side */
write_seqlock(&lock);
/* modify shared data */
write_sequnlock(&lock);

When to use seqlock


Preemption Disabling

When a spinlock is held, kernel preemption is automatically disabled on that CPU. But sometimes you need to work with per-CPU data without a spinlock. If you are preempted mid-operation and rescheduled on a different CPU, your per-CPU data is suddenly the wrong CPU's data.

The fix is to explicitly disable preemption:

int cpu;
cpu = get_cpu();          /* disable preemption, return current CPU id */
/* safely access per_cpu_data[cpu] here */
put_cpu();                /* re-enable preemption */

After put_cpu(), the cpu variable is no longer valid—you may have been rescheduled.


Memory Ordering and Barriers

The reordering problem

Both the compiler (at compile time) and the CPU (at run time) may reorder memory operations for performance:

In concurrent kernel code, this means a sequence like:

a = 1;
b = 2;

might be observed by another CPU as b = 2 happening before a = 1.

Barrier types

Barrier What it prevents
barrier() Compiler reordering only (no CPU instruction emitted)
rmb() CPU from reordering reads across the barrier
wmb() CPU from reordering writes across the barrier
mb() CPU from reordering both reads and writes

Example

/* Initial: a = 1, b = 2 */
a = 3;
wmb();    /* ensures a is written to memory before b */
b = 4;

On the reader side:

b_val = b;
rmb();    /* ensures b is read before a */
a_val = a;

With these barriers, any CPU that sees b == 4 is guaranteed to also see a == 3.


Read-Copy-Update (RCU)

Motivation: the limits of rwlock and seqlock

Both rwlock and seqlock require readers to do some coordination with writers (acquire a lock or check a counter). For read-mostly data structures accessed on every CPU on every context switch or system call, even that small overhead adds up.

The RCU insight

RCU decouples reading from writing entirely:

The copy-modify-publish pattern

For a linked list:

  1. Writer copies the node to be updated.
  2. Writer modifies the copy.
  3. Writer atomically updates the next pointer of the preceding node to point to the new copy (new readers now see the updated node).
  4. Writer waits until all existing readers finish (a grace period), then frees the old node.

This is safe because a reader either sees the old pointer (and finishes before the free) or sees the new pointer (and never touches the old node).

Core RCU API

API Side Purpose
rcu_read_lock() Reader Marks entry into RCU critical section (disables preemption)
rcu_read_unlock() Reader Marks exit; if no more readers on this CPU, ends grace period
rcu_dereference(p) Reader Safely load an RCU-protected pointer (includes read barrier)
rcu_assign_pointer(p, v) Writer Atomically publish a new pointer (includes write barrier)
synchronize_rcu() Writer Block until all pre-existing readers finish (grace period)
call_rcu(&head, func) Writer Asynchronous alternative: call func after grace period

Replacing rwlock with RCU

/* Old rwlock reader */
read_lock(&list_lock);
list_for_each_entry(p, &list, node) { ... }
read_unlock(&list_lock);

/* New RCU reader */
rcu_read_lock();
list_for_each_entry_rcu(p, &list, node) { ... }
rcu_read_unlock();
/* Old rwlock writer */
write_lock(&list_lock);
list_del(&entry->node);
write_unlock(&list_lock);
free(entry);

/* New RCU writer */
spin_lock(&writer_lock);          /* RCU does NOT coordinate writers */
list_del_rcu(&entry->node);
spin_unlock(&writer_lock);
synchronize_rcu();                /* wait for readers */
free(entry);

RCU limitations

  1. Writers still need mutual exclusion among themselves. RCU only protects readers from writers; most RCU-based code uses a spinlock to serialize concurrent writes.
  2. Updates must be a single-pointer swap. Complex multi-field updates are hard to express as a single atomic pointer change. This constrains the data structures and algorithms that can use RCU effectively.

Where RCU is used in Linux

RCU is pervasive in the Linux kernel: the directory-entry cache (dcache), routing tables, network protocol handlers, and the process credential structure all rely on it. Paul McKenney (Meta/IBM) developed and championed RCU in Linux.


Key Takeaways

Practice

  1. What does a completion variable's complete() call do?
  2. In a seqlock, what does an odd sequence counter value mean to a reader?
  3. Why must get_cpu() disable kernel preemption, and what happens if you call put_cpu() and then use the CPU id it returned?
  4. Which barrier call prevents only the CPU from reordering memory reads across a specific point in the code?
  5. In the RCU copy-modify-publish pattern for a linked list, why is it safe for an ongoing reader to continue using the old node after the writer has published the new one?
  6. A kernel developer wants to use RCU to protect a linked list of network routes. Two background threads simultaneously discover that the same stale route must be deleted. What is the correct approach?
  7. In one or two sentences, explain the difference between barrier() and mb() in the Linux kernel.
  8. Describe the three steps a seqlock reader takes to safely read shared data, and explain what it must do if it detects a concurrent write.
  9. A colleague proposes replacing a global rwlock on a frequently-read, rarely-written configuration table with RCU. They realize the update changes three fields atomically. What is the fundamental challenge, and how should they address it?
  10. Explain why seqlock is a poor choice for protecting a data structure that contains pointers, even though it avoids reader starvation.