Kernel Synchronization II: Sleeping Locks, Seqlocks, and Memory Barriers

Why This Matters

Spinlocks are fast but wasteful: a spinning CPU burns cycles and cannot sleep. The Linux kernel needs a broader palette of synchronization primitives — ones that let threads sleep while waiting, allow concurrent readers, sequence data versioning, and control how the CPU and compiler reorder memory accesses. This module covers that palette. Choosing the wrong primitive is one of the most common sources of kernel bugs and performance problems, so understanding the trade-offs is essential.


Reader-Writer Spinlocks

The Problem They Solve

A plain spinlock gives one holder exclusive access even when multiple threads only want to read shared data. If reads dominate (e.g., searching a linked list) this serializes work unnecessarily.

A reader-writer spinlock (rwlock) allows:

API

#include <linux/spinlock.h>

DEFINE_RWLOCK(mr_rwlock);

/* Reader side */
read_lock(&mr_rwlock);
/* ... read-only critical section ... */
read_unlock(&mr_rwlock);

/* Writer side */
write_lock(&mr_rwlock);
/* ... read/write critical section ... */
write_unlock(&mr_rwlock);

There are also _irq, _irqsave, and _bh variants mirroring the plain spinlock family.

Fairness Warning

Linux's rwlock favors readers over writers. If the read lock is held and new readers keep arriving, a waiting writer can be starved indefinitely. This is fine when writers are rare; it becomes a bug when they aren't.


Semaphores — Sleeping Locks

Key Insight

When a task cannot acquire a semaphore, rather than spinning, it is placed on a wait queue and put to sleep. The CPU is free to run other work. When the semaphore is released, the kernel wakes one waiter.

This makes semaphores appropriate for locks held for a long time and non-interrupt contexts (interrupt handlers cannot sleep). The overhead of sleeping, maintaining the wait queue, and waking up makes semaphores a poor choice for short critical sections — use a spinlock there.

Counting vs. Binary

A semaphore has an internal counter:

Counter value Meaning
> 0 Available; acquisition decrements it
0 Unavailable; threads sleep

Setting the initial count to 1 gives a binary semaphore (mutex-like). The kernel uses binary semaphores most often.

API (include/linux/semaphore.h)

Function Effect
sema_init(&sem, count) Initialize counter
down(&sem) Acquire (sleep if unavailable); uninterruptible
down_interruptible(&sem) Acquire; returns -EINTR if interrupted by signal
down_trylock(&sem) Non-blocking; returns non-zero if unavailable
up(&sem) Release

Reader-Writer Semaphores

A rw_semaphore provides reader/writer semantics with sleeping:

#include <linux/rwsem.h>

init_rwsem(&my_sem);
down_read(&my_sem);   /* acquire read lock */
up_read(&my_sem);
down_write(&my_sem);  /* acquire write lock */
up_write(&my_sem);

downgrade_write() atomically converts a held write lock into a read lock, useful when the write phase is over but reads still need protection.


Mutex

A mutex (struct mutex) is a binary semaphore with stricter semantics enforced by the kernel:

  1. Only one task holds the mutex at a time.
  2. The same task that locked it must unlock it.
  3. No recursive locking — acquiring the same mutex twice causes deadlock.
  4. A task cannot exit while holding a mutex.
  5. A mutex cannot be acquired in interrupt context (sleeping is not allowed there).
  6. The mutex must be managed only through the official API.

These rules let the kernel add debugging checks (CONFIG_DEBUG_MUTEXES) that catch misuse.

API

#include <linux/mutex.h>

DEFINE_MUTEX(my_mutex);    /* static */
mutex_init(&my_mutex);     /* dynamic */

mutex_lock(&my_mutex);
/* critical section */
mutex_unlock(&my_mutex);

mutex_trylock(&my_mutex);  /* non-blocking; returns 1 on success */
mutex_is_locked(&my_mutex);

Semaphore vs. Mutex — Which to Choose?

Default to mutex. Move to semaphore only if you genuinely need a count > 1.

Mutexes are simpler, have richer debugging support, and the kernel can optimize their fast path.


Spinlock vs. Mutex — Quick Reference

Requirement Primitive
Must run in interrupt context Spinlock
Hold time is very short Spinlock
Task may sleep while waiting Mutex / Semaphore
Hold time is long Mutex / Semaphore
Need recursive acquire Neither — redesign
Multiple simultaneous holders Semaphore (count > 1)

Completion Variables

A completion variable (struct completion) signals that a particular event has happened. It is the idiomatic kernel replacement for the pattern "thread A waits until thread B finishes initialization."

#include <linux/completion.h>

DECLARE_COMPLETION(my_comp);    /* static */
init_completion(&my_comp);      /* dynamic */

/* Thread A — waits */
wait_for_completion(&my_comp);

/* Thread B — signals when done */
complete(&my_comp);
complete_all(&my_comp);   /* wake all waiters */

Completion variables avoid the race condition that would exist if you tried to implement the same pattern with a semaphore or a flag plus a condition variable.


Sequential Lock (Seqlock)

Concept

A seqlock maintains a sequence counter alongside the data. The protocol:

Counter even   → no write in progress
Counter odd    → write in progress
Counter changed between two reads → write occurred; retry

When to Use

Condition Seqlock good fit?
Many readers, few writers Yes
Writers must not be starved Yes — writers are never blocked by readers
Read data must be pointer-safe No — readers may see torn pointers; avoid seqlocks for pointer-heavy data

Seqlocks are used for the kernel's jiffies and wall-clock time, where reads vastly outnumber writes and write latency must be bounded.


Preemption Disabling

Spinlocks implicitly disable preemption on the local CPU. Sometimes you need preemption disabled without a lock — the canonical case is per-CPU data:

int cpu;
cpu = get_cpu();            /* disables preemption; returns current CPU id */
/* access per-cpu data safely */
put_cpu();                  /* re-enables preemption */

After get_cpu(), the current task will not be migrated to another CPU, so cpu remains valid until put_cpu().


Memory Ordering and Barriers

The Problem

Both the compiler (at compile time) and the CPU (at run time) may reorder memory reads and writes to improve performance. On a multi-core system, this can break assumptions about the order in which other CPUs observe stores.

Example:

/* Thread 1 */        /* Thread 2 */
a = 1;                while (b != 2) ;
mb();                 rmb();
b = 2;                assert(a == 1);  /* could fail without barriers */

Without mb() on the write side, the CPU might store b = 2 before a = 1. Without rmb() on the read side, the CPU might load a before observing the new value of b.

Barrier Primitives

Function What It Prevents
barrier() Compiler reordering only (no CPU instruction)
rmb() CPU reordering of reads across the barrier
wmb() CPU reordering of writes across the barrier
mb() CPU reordering of both reads and writes
read_barrier_depends() Data-dependent loads; a no-op on most architectures (much cheaper than rmb())

barrier() is cheap — it just tells the compiler to flush its register state. The CPU-level barriers (rmb, wmb, mb) emit actual fence instructions and are more expensive.


Key Takeaways

Practice

  1. Which statement best describes the behavior of Linux's reader-writer spinlock (rwlock) when both readers and a writer are competing for the lock?
  2. A semaphore is a better choice than a spinlock when:
  3. Which of the following is a restriction that applies to a mutex but NOT to a binary semaphore?
  4. A seqlock's sequence counter is observed to be 7 when a reader starts. What should the reader do?
  5. What is the primary purpose of get_cpu() / put_cpu() in the kernel?
  6. Consider this kernel code fragment: c a = 1; mb(); b = 2; And on another CPU: c while (b != 2) ; rmb(); if (a == 1) { /* safe? */ } What does the combination of mb() and rmb() guarantee?
  7. Explain why a semaphore cannot be used inside a hardware interrupt handler, and name an alternative synchronization primitive that can be.
  8. A kernel developer wants to protect a frequently-read, rarely-updated 64-bit timestamp. Two requirements: (1) reader latency must be minimal, and (2) writers must never be blocked by readers. Which synchronization primitive fits best and why? Describe how both the reader and writer sides would use it.
  9. What is a completion variable used for, and how does it differ from using a mutex or semaphore to achieve the same effect?
  10. A kernel module calls barrier() (compiler barrier) between two stores. On a multiprocessor system, which problem does this NOT solve?