Kernel Synchronization II: Sleeping Locks, Seqlocks, and Memory Barriers

Why This Matters

Spinlocks are fast but wasteful: a spinning CPU burns cycles and cannot sleep. The Linux kernel needs a broader palette of synchronization primitives — ones that let threads sleep while waiting, allow concurrent readers, sequence data versioning, and control how the CPU and compiler reorder memory accesses. This module covers that palette. Choosing the wrong primitive is one of the most common sources of kernel bugs and performance problems, so understanding the trade-offs is essential.

Reader-Writer Spinlocks

The Problem They Solve

A plain spinlock gives one holder exclusive access even when multiple threads only want to read shared data. If reads dominate (e.g., searching a linked list) this serializes work unnecessarily.

A reader-writer spinlock (rwlock) allows:

Multiple concurrent readers — as long as no writer holds the lock.
One exclusive writer — no readers or other writers allowed.

API

#include <linux/spinlock.h>

DEFINE_RWLOCK(mr_rwlock);

/* Reader side */
read_lock(&mr_rwlock);
/* ... read-only critical section ... */
read_unlock(&mr_rwlock);

/* Writer side */
write_lock(&mr_rwlock);
/* ... read/write critical section ... */
write_unlock(&mr_rwlock);

There are also _irq, _irqsave, and _bh variants mirroring the plain spinlock family.

Fairness Warning

Linux's rwlock favors readers over writers. If the read lock is held and new readers keep arriving, a waiting writer can be starved indefinitely. This is fine when writers are rare; it becomes a bug when they aren't.

Semaphores — Sleeping Locks

Key Insight

When a task cannot acquire a semaphore, rather than spinning, it is placed on a wait queue and put to sleep. The CPU is free to run other work. When the semaphore is released, the kernel wakes one waiter.

This makes semaphores appropriate for locks held for a long time and non-interrupt contexts (interrupt handlers cannot sleep). The overhead of sleeping, maintaining the wait queue, and waking up makes semaphores a poor choice for short critical sections — use a spinlock there.

Counting vs. Binary

A semaphore has an internal counter:

Counter value	Meaning
> 0	Available; acquisition decrements it
0	Unavailable; threads sleep

Setting the initial count to 1 gives a binary semaphore (mutex-like). The kernel uses binary semaphores most often.

API (include/linux/semaphore.h)

Function	Effect
`sema_init(&sem, count)`	Initialize counter
`down(&sem)`	Acquire (sleep if unavailable); uninterruptible
`down_interruptible(&sem)`	Acquire; returns `-EINTR` if interrupted by signal
`down_trylock(&sem)`	Non-blocking; returns non-zero if unavailable
`up(&sem)`	Release

Reader-Writer Semaphores

A rw_semaphore provides reader/writer semantics with sleeping:

#include <linux/rwsem.h>

init_rwsem(&my_sem);
down_read(&my_sem);   /* acquire read lock */
up_read(&my_sem);
down_write(&my_sem);  /* acquire write lock */
up_write(&my_sem);

downgrade_write() atomically converts a held write lock into a read lock, useful when the write phase is over but reads still need protection.

Mutex

A mutex (struct mutex) is a binary semaphore with stricter semantics enforced by the kernel:

Only one task holds the mutex at a time.
The same task that locked it must unlock it.
No recursive locking — acquiring the same mutex twice causes deadlock.
A task cannot exit while holding a mutex.
A mutex cannot be acquired in interrupt context (sleeping is not allowed there).
The mutex must be managed only through the official API.

These rules let the kernel add debugging checks (CONFIG_DEBUG_MUTEXES) that catch misuse.

API

#include <linux/mutex.h>

DEFINE_MUTEX(my_mutex);    /* static */
mutex_init(&my_mutex);     /* dynamic */

mutex_lock(&my_mutex);
/* critical section */
mutex_unlock(&my_mutex);

mutex_trylock(&my_mutex);  /* non-blocking; returns 1 on success */
mutex_is_locked(&my_mutex);

Semaphore vs. Mutex — Which to Choose?

Default to mutex. Move to semaphore only if you genuinely need a count > 1.

Mutexes are simpler, have richer debugging support, and the kernel can optimize their fast path.

Spinlock vs. Mutex — Quick Reference

Requirement	Primitive
Must run in interrupt context	Spinlock
Hold time is very short	Spinlock
Task may sleep while waiting	Mutex / Semaphore
Hold time is long	Mutex / Semaphore
Need recursive acquire	Neither — redesign
Multiple simultaneous holders	Semaphore (count > 1)

Completion Variables

A completion variable (struct completion) signals that a particular event has happened. It is the idiomatic kernel replacement for the pattern "thread A waits until thread B finishes initialization."

#include <linux/completion.h>

DECLARE_COMPLETION(my_comp);    /* static */
init_completion(&my_comp);      /* dynamic */

/* Thread A — waits */
wait_for_completion(&my_comp);

/* Thread B — signals when done */
complete(&my_comp);
complete_all(&my_comp);   /* wake all waiters */

Completion variables avoid the race condition that would exist if you tried to implement the same pattern with a semaphore or a flag plus a condition variable.

Sequential Lock (Seqlock)

Concept

A seqlock maintains a sequence counter alongside the data. The protocol:

Writer: increment counter (makes it odd) → write data → increment counter again (makes it even).
Reader: read counter → read data → read counter again. If both counter readings are equal and even, no write was in progress. Otherwise retry.

Counter even   → no write in progress
Counter odd    → write in progress
Counter changed between two reads → write occurred; retry

When to Use

Condition	Seqlock good fit?
Many readers, few writers	Yes
Writers must not be starved	Yes — writers are never blocked by readers
Read data must be pointer-safe	No — readers may see torn pointers; avoid seqlocks for pointer-heavy data

Seqlocks are used for the kernel's jiffies and wall-clock time, where reads vastly outnumber writes and write latency must be bounded.

Preemption Disabling

Spinlocks implicitly disable preemption on the local CPU. Sometimes you need preemption disabled without a lock — the canonical case is per-CPU data:

int cpu;
cpu = get_cpu();            /* disables preemption; returns current CPU id */
/* access per-cpu data safely */
put_cpu();                  /* re-enables preemption */

After get_cpu(), the current task will not be migrated to another CPU, so cpu remains valid until put_cpu().

Memory Ordering and Barriers

The Problem

Both the compiler (at compile time) and the CPU (at run time) may reorder memory reads and writes to improve performance. On a multi-core system, this can break assumptions about the order in which other CPUs observe stores.

Example:

/* Thread 1 */        /* Thread 2 */
a = 1;                while (b != 2) ;
mb();                 rmb();
b = 2;                assert(a == 1);  /* could fail without barriers */

Without mb() on the write side, the CPU might store b = 2 before a = 1. Without rmb() on the read side, the CPU might load a before observing the new value of b.

Barrier Primitives

Function	What It Prevents
`barrier()`	Compiler reordering only (no CPU instruction)
`rmb()`	CPU reordering of reads across the barrier
`wmb()`	CPU reordering of writes across the barrier
`mb()`	CPU reordering of both reads and writes
`read_barrier_depends()`	Data-dependent loads; a no-op on most architectures (much cheaper than `rmb()`)

barrier() is cheap — it just tells the compiler to flush its register state. The CPU-level barriers (rmb, wmb, mb) emit actual fence instructions and are more expensive.

Key Takeaways

rwlock lets parallel readers coexist; Linux's implementation favors readers, risking writer starvation.
Semaphores sleep instead of spin — great for long hold times, forbidden in interrupt context.
Mutex is the stricter, more debuggable binary semaphore; prefer it over a raw semaphore.
Completion variables cleanly signal "event done" between tasks without polling.
Seqlock gives writers priority and lets readers retry on conflict — ideal for frequently-read, rarely-written data (e.g., jiffies).
Preemption disabling (get_cpu/put_cpu) protects per-CPU data without a lock.
Memory barriers (rmb, wmb, mb, barrier) prevent compiler and CPU reordering from breaking concurrent code.
When in doubt about which primitive to choose, start conservative (mutex/spinlock) and relax only with a clear performance justification.

Kernel Synchronization II: Sleeping Locks, Seqlocks, and Memory Barriers

Why This Matters

Reader-Writer Spinlocks

The Problem They Solve

API

Fairness Warning

Semaphores — Sleeping Locks

Key Insight

Counting vs. Binary

API (include/linux/semaphore.h)

Reader-Writer Semaphores

Mutex

API

Semaphore vs. Mutex — Which to Choose?

Spinlock vs. Mutex — Quick Reference

Completion Variables

Sequential Lock (Seqlock)

Concept

When to Use

Preemption Disabling

Memory Ordering and Barriers

The Problem

Barrier Primitives

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results