Kernel Synchronization II: Sleeping Locks, Seqlocks, and Memory Barriers
Why This Matters
Spinlocks are fast but wasteful: a spinning CPU burns cycles and cannot sleep. The Linux kernel needs a broader palette of synchronization primitives — ones that let threads sleep while waiting, allow concurrent readers, sequence data versioning, and control how the CPU and compiler reorder memory accesses. This module covers that palette. Choosing the wrong primitive is one of the most common sources of kernel bugs and performance problems, so understanding the trade-offs is essential.
Reader-Writer Spinlocks
The Problem They Solve
A plain spinlock gives one holder exclusive access even when multiple threads only want to read shared data. If reads dominate (e.g., searching a linked list) this serializes work unnecessarily.
A reader-writer spinlock (rwlock) allows:
- Multiple concurrent readers — as long as no writer holds the lock.
- One exclusive writer — no readers or other writers allowed.
API
#include <linux/spinlock.h>
DEFINE_RWLOCK(mr_rwlock);
/* Reader side */
read_lock(&mr_rwlock);
/* ... read-only critical section ... */
read_unlock(&mr_rwlock);
/* Writer side */
write_lock(&mr_rwlock);
/* ... read/write critical section ... */
write_unlock(&mr_rwlock);
There are also _irq, _irqsave, and _bh variants mirroring the plain spinlock family.
Fairness Warning
Linux's rwlock favors readers over writers. If the read lock is held and new readers keep arriving, a waiting writer can be starved indefinitely. This is fine when writers are rare; it becomes a bug when they aren't.
Semaphores — Sleeping Locks
Key Insight
When a task cannot acquire a semaphore, rather than spinning, it is placed on a wait queue and put to sleep. The CPU is free to run other work. When the semaphore is released, the kernel wakes one waiter.
This makes semaphores appropriate for locks held for a long time and non-interrupt contexts (interrupt handlers cannot sleep). The overhead of sleeping, maintaining the wait queue, and waking up makes semaphores a poor choice for short critical sections — use a spinlock there.
Counting vs. Binary
A semaphore has an internal counter:
| Counter value | Meaning |
|---|---|
| > 0 | Available; acquisition decrements it |
| 0 | Unavailable; threads sleep |
Setting the initial count to 1 gives a binary semaphore (mutex-like). The kernel uses binary semaphores most often.
API (include/linux/semaphore.h)
| Function | Effect |
|---|---|
sema_init(&sem, count) |
Initialize counter |
down(&sem) |
Acquire (sleep if unavailable); uninterruptible |
down_interruptible(&sem) |
Acquire; returns -EINTR if interrupted by signal |
down_trylock(&sem) |
Non-blocking; returns non-zero if unavailable |
up(&sem) |
Release |
Reader-Writer Semaphores
A rw_semaphore provides reader/writer semantics with sleeping:
#include <linux/rwsem.h>
init_rwsem(&my_sem);
down_read(&my_sem); /* acquire read lock */
up_read(&my_sem);
down_write(&my_sem); /* acquire write lock */
up_write(&my_sem);
downgrade_write() atomically converts a held write lock into a read lock, useful when the write phase is over but reads still need protection.
Mutex
A mutex (struct mutex) is a binary semaphore with stricter semantics enforced by the kernel:
- Only one task holds the mutex at a time.
- The same task that locked it must unlock it.
- No recursive locking — acquiring the same mutex twice causes deadlock.
- A task cannot exit while holding a mutex.
- A mutex cannot be acquired in interrupt context (sleeping is not allowed there).
- The mutex must be managed only through the official API.
These rules let the kernel add debugging checks (CONFIG_DEBUG_MUTEXES) that catch misuse.
API
#include <linux/mutex.h>
DEFINE_MUTEX(my_mutex); /* static */
mutex_init(&my_mutex); /* dynamic */
mutex_lock(&my_mutex);
/* critical section */
mutex_unlock(&my_mutex);
mutex_trylock(&my_mutex); /* non-blocking; returns 1 on success */
mutex_is_locked(&my_mutex);
Semaphore vs. Mutex — Which to Choose?
Default to mutex. Move to semaphore only if you genuinely need a count > 1.
Mutexes are simpler, have richer debugging support, and the kernel can optimize their fast path.
Spinlock vs. Mutex — Quick Reference
| Requirement | Primitive |
|---|---|
| Must run in interrupt context | Spinlock |
| Hold time is very short | Spinlock |
| Task may sleep while waiting | Mutex / Semaphore |
| Hold time is long | Mutex / Semaphore |
| Need recursive acquire | Neither — redesign |
| Multiple simultaneous holders | Semaphore (count > 1) |
Completion Variables
A completion variable (struct completion) signals that a particular event has happened. It is the idiomatic kernel replacement for the pattern "thread A waits until thread B finishes initialization."
#include <linux/completion.h>
DECLARE_COMPLETION(my_comp); /* static */
init_completion(&my_comp); /* dynamic */
/* Thread A — waits */
wait_for_completion(&my_comp);
/* Thread B — signals when done */
complete(&my_comp);
complete_all(&my_comp); /* wake all waiters */
Completion variables avoid the race condition that would exist if you tried to implement the same pattern with a semaphore or a flag plus a condition variable.
Sequential Lock (Seqlock)
Concept
A seqlock maintains a sequence counter alongside the data. The protocol:
- Writer: increment counter (makes it odd) → write data → increment counter again (makes it even).
- Reader: read counter → read data → read counter again. If both counter readings are equal and even, no write was in progress. Otherwise retry.
Counter even → no write in progress
Counter odd → write in progress
Counter changed between two reads → write occurred; retry
When to Use
| Condition | Seqlock good fit? |
|---|---|
| Many readers, few writers | Yes |
| Writers must not be starved | Yes — writers are never blocked by readers |
| Read data must be pointer-safe | No — readers may see torn pointers; avoid seqlocks for pointer-heavy data |
Seqlocks are used for the kernel's jiffies and wall-clock time, where reads vastly outnumber writes and write latency must be bounded.
Preemption Disabling
Spinlocks implicitly disable preemption on the local CPU. Sometimes you need preemption disabled without a lock — the canonical case is per-CPU data:
int cpu;
cpu = get_cpu(); /* disables preemption; returns current CPU id */
/* access per-cpu data safely */
put_cpu(); /* re-enables preemption */
After get_cpu(), the current task will not be migrated to another CPU, so cpu remains valid until put_cpu().
Memory Ordering and Barriers
The Problem
Both the compiler (at compile time) and the CPU (at run time) may reorder memory reads and writes to improve performance. On a multi-core system, this can break assumptions about the order in which other CPUs observe stores.
Example:
/* Thread 1 */ /* Thread 2 */
a = 1; while (b != 2) ;
mb(); rmb();
b = 2; assert(a == 1); /* could fail without barriers */
Without mb() on the write side, the CPU might store b = 2 before a = 1. Without rmb() on the read side, the CPU might load a before observing the new value of b.
Barrier Primitives
| Function | What It Prevents |
|---|---|
barrier() |
Compiler reordering only (no CPU instruction) |
rmb() |
CPU reordering of reads across the barrier |
wmb() |
CPU reordering of writes across the barrier |
mb() |
CPU reordering of both reads and writes |
read_barrier_depends() |
Data-dependent loads; a no-op on most architectures (much cheaper than rmb()) |
barrier() is cheap — it just tells the compiler to flush its register state. The CPU-level barriers (rmb, wmb, mb) emit actual fence instructions and are more expensive.
Key Takeaways
- rwlock lets parallel readers coexist; Linux's implementation favors readers, risking writer starvation.
- Semaphores sleep instead of spin — great for long hold times, forbidden in interrupt context.
- Mutex is the stricter, more debuggable binary semaphore; prefer it over a raw semaphore.
- Completion variables cleanly signal "event done" between tasks without polling.
- Seqlock gives writers priority and lets readers retry on conflict — ideal for frequently-read, rarely-written data (e.g., jiffies).
- Preemption disabling (
get_cpu/put_cpu) protects per-CPU data without a lock. - Memory barriers (
rmb,wmb,mb,barrier) prevent compiler and CPU reordering from breaking concurrent code. - When in doubt about which primitive to choose, start conservative (mutex/spinlock) and relax only with a clear performance justification.