Kernel Synchronization: Spinlocks, Sleeplocks, Deadlocks, and Multi-core Boot

Why This Matters

A single-core OS can get away with disabling interrupts to protect shared data. The moment a second CPU enters the picture, that trick fails — two processors can race on the same kernel data structure simultaneously. Kernel synchronization is what keeps that from corrupting state. Understanding it also unlocks everything that comes after: process scheduling, file systems, and device drivers all depend on locking done right.

1. Atomic Instructions — The Hardware Foundation

Locks are only possible because CPUs provide instructions that read and write memory atomically: no other CPU can observe a half-finished operation.

Two workhorses in x86:

Instruction	What it does
`LOCK XADD dst, src`	Atomically: `old = dst; dst = dst + src; return old`
`LOCK XCHG dst, src`	Atomically swaps `dst` and `src`

The LOCK prefix causes the CPU to assert a bus lock (or cache-line lock) for the duration of the instruction, ensuring the read-modify-write is indivisible.

xv6 wraps LOCK XCHG in a helper:

static inline uint xchg(volatile uint *addr, uint newval) {
    uint result;
    asm volatile("lock; xchgl %0, %1"
        : "+m" (*addr), "=a" (result)
        : "1" (newval)
        : "cc");
    return result;   // returns the *old* value of *addr
}

The +m constraint tells the compiler that *addr is both read and written ("read-modify-write operand"). The return value is the value *addr held before the swap.

2. xv6 Spinlock

A spinlock is a busy-wait lock: a thread trying to acquire it loops ("spins") until it succeeds. This is cheap when critical sections are very short (a few instructions), because the overhead of putting a thread to sleep and waking it up would cost more than just spinning.

Structure

struct spinlock {
    uint locked;   // 0 = free, 1 = held
    char *name;
    struct cpu *cpu;
    // ...
};

Acquire

void acquire(struct spinlock *lk) {
    pushcli();                         // disable interrupts
    if (holding(lk)) panic("acquire"); // detect self-deadlock

    while (xchg(&lk->locked, 1) != 0) // spin until we swap in 1
        ;

    __sync_synchronize();              // memory barrier
}

Key points:

Interrupts are disabled first. If an interrupt fired while the lock was held and the interrupt handler tried to acquire the same lock, the CPU would deadlock with itself.
xchg returns the old value. If the old value was 0, the lock was free and we just claimed it. If it was 1, someone else held it — loop again.
__sync_synchronize() is a full memory barrier: it prevents the compiler and CPU from reordering memory operations across this point (see Section 5).

Release

void release(struct spinlock *lk) {
    __sync_synchronize();              // barrier before clearing the lock

    asm volatile("movl $0, %0" : "+m" (lk->locked) : );

    popcli();                          // re-enable interrupts
}

The barrier ensures any writes inside the critical section are visible to other CPUs before the lock word is cleared. A plain lk->locked = 0 would be optimizable by the compiler in ways that break this guarantee.

Usage Pattern

struct {
    struct spinlock lock;
    struct proc proc[NPROC];
} ptable;

initlock(&ptable.lock, "ptable");

acquire(&ptable.lock);
for (p = ptable.proc; p < &ptable.proc[NPROC]; p++)
    // ... examine/modify process table
release(&ptable.lock);

When to Use a Lock — Checklist

Ask these questions about any piece of data:

Is the data global (shared across CPUs or contexts)?
Can another thread access it concurrently?
What happens if this function runs simultaneously on another CPU?
Is the data shared between process context and interrupt context?
If a process is preempted while accessing this data, can the newly scheduled process touch the same data?

If any answer is "yes", you need a lock.

3. xv6 Sleeplock

Spinlocks are wrong for long waits — spinning wastes a whole CPU. If a critical section involves disk I/O or another slow operation, the thread should sleep instead.

Structure

struct sleeplock {
    uint locked;         // Is the lock held?
    struct spinlock lk;  // protects this sleeplock's fields
    char *name;
    int pid;
};

A sleeplock embeds a spinlock to protect its own locked field during the brief moment of inspection and update.

Acquire

void acquiresleep(struct sleeplock *lk) {
    acquire(&lk->lk);           // protect the check below
    while (lk->locked)
        sleep(lk, &lk->lk);    // atomically releases lk->lk and sleeps
    lk->locked = 1;
    lk->pid = proc->pid;
    release(&lk->lk);
}

sleep(chan, lk) atomically releases the spinlock and puts the process to sleep on channel chan. This avoids the lost-wakeup race: the wakeup cannot arrive between checking lk->locked and calling sleep.

Release

void releasesleep(struct sleeplock *lk) {
    acquire(&lk->lk);
    lk->locked = 0;
    lk->pid = 0;
    wakeup(lk);          // wake all sleepers on this channel
    release(&lk->lk);
}

Usage (inode locking in `fs.c`)

for (i = 0; i < NINODE; i++)
    initsleeplock(&icache.inode[i].lock, "inode");

acquiresleep(&ip->lock);
// Read blocks from disk — this may block for milliseconds
releasesleep(&ip->lock);

Spinlock vs. Sleeplock

Property	Spinlock	Sleeplock
Waiting thread	Spins (wastes CPU)	Sleeps (yields CPU)
Critical section	Very short (no blocking)	Can be long (disk I/O OK)
Can hold across `sleep()`?	No	Yes
Interrupts	Disabled while held	Not disabled

4. Deadlocks

A deadlock occurs when a set of threads are each waiting for a resource held by another thread in the set — nobody can proceed.

Self-Deadlock

struct spinlock mylock;

void low_level_fn() {
    acquire(&mylock);     // (2) tries to acquire — DEADLOCK: it's already held
    // ...
    release(&mylock);
}

void high_level_fn() {
    acquire(&mylock);     // (1) acquires successfully
    low_level_fn();       // (2) calls into a function that also tries to acquire
    release(&mylock);
}

xv6 spinlocks are not reentrant. Calling acquire on a lock you already hold panics (the holding() check in acquire). Without that check, you'd spin forever.

Circular Deadlock (Two Locks, Two Threads)

Thread 1:           Thread 2:
acquire(lock1)      acquire(lock2)
acquire(lock2)      acquire(lock1)   ← each waits for the other

Thread 1 holds lock1 and waits for lock2. Thread 2 holds lock2 and waits for lock1. Neither can progress.

Prevention Strategies

Strategy	How it works
Fixed global lock ordering	All code acquires locks in the same global order (e.g., always `lock1` before `lock2`) — circular waits are impossible
Trylock	Attempt to acquire; if it fails, release all held locks and retry from scratch
Lock hierarchy	Assign a level number; only acquire locks at strictly increasing levels

Note: disabling interrupts globally before acquiring any lock does not prevent multi-CPU deadlocks; it only prevents interrupt-handler deadlocks on a single CPU.

5. Memory Ordering and Barriers

Modern compilers and CPUs reorder memory operations to improve performance. This is safe within a single thread (the CPU preserves the illusion of in-order execution for that thread), but it can break assumptions between CPUs.

Example: without barriers, a compiler might move the lk->locked = 0 in release() before the stores inside the critical section, making other CPUs observe a "released" lock while the critical section's writes haven't landed yet.

__sync_synchronize();

This GCC built-in emits a full memory barrier (on x86: mfence, or a lock orq $0, (%rsp)). It tells both the compiler and the CPU:

All loads and stores before the barrier complete before any loads/stores after it.

xv6 places barriers in both acquire (after spinning) and release (before clearing the lock) to sandwich the critical section.

6. Multi-core Boot in xv6

Booting additional CPUs requires careful synchronization.

Roles

BSP (Bootstrap Processor): The one CPU that the BIOS starts. Runs main() first.
AP (Application Processor): Every other CPU. Must be started explicitly by the BSP.

Boot Sequence

BSP:
  main()
    mpinit()         ← reads MP configuration table, discovers APs
    startothers()    ← wakes one AP at a time
      lapicstartap() ← sends INIT/SIPI inter-processor interrupt to AP
      while(c->started == 0) ;  ← busy-waits until AP is up
    mpmain()         ← BSP finishes its own per-CPU init, enters scheduler()

Each AP:
  entryother.S      ← trampoline at physical address 0x7000
  mpenter()         ← per-CPU C initialization
  mpmain()
    xchg(&cpu->started, 1)   ← signal BSP that this CPU is ready
    scheduler()              ← start scheduling

The xchg on cpu->started is both the signal to the BSP and an implicit memory barrier, ensuring the AP's initialization is visible before the flag is set.

Scalability Concern

With 64 CPUs all issuing system calls simultaneously:

Every system call traps into the kernel and touches global data structures (process table, file table, etc.).
Each of those accesses acquires a spinlock.
All CPUs contend on the same few locks → lock contention throttles throughput.

A single global lock means at most one CPU does useful kernel work at a time. Real kernels address this with fine-grained locking (per-CPU structures, per-inode locks) and lock-free data structures.

7. Read/Write Locks

Many data structures are read far more often than written. A plain spinlock serializes readers unnecessarily.

A read/write lock allows either:

One writer (exclusive), with no readers or other writers; or
Many readers simultaneously, with no writer.

xv6-style Implementation

struct rwlock { int n; };
// n == 0:   unlocked
// n == -1:  locked by one writer
// n > 0:    locked by n readers

Writer lock:

w_lock(l):
    while (1):
        if comp_xchg(&l->n, 0, -1)  // only succeeds if n == 0
            return

Reader lock:

r_lock(l):
    while (1):
        x = l->n
        if x < 0: continue           // writer holds it
        if comp_xchg(&l->n, x, x+1) // atomically increment reader count
            return

comp_xchg (compare-and-swap, CMPXCHG) atomically checks that the value equals an expected value, then replaces it — this is the same primitive as xchg but conditional.

Linux's rwlock is reader-preferred: readers can always enter as long as no writer holds the lock. A downside is that a continuous stream of readers can starve writers.

Seqlock (Brief)

A seqlock avoids writer starvation by letting writers proceed unconditionally. Readers detect a concurrent write by checking a sequence counter before and after reading — if the counter changed, the read is retried. Writers never block.

Key Takeaways

Atomic instructions (LOCK XCHG, CMPXCHG) are the hardware building blocks for all kernel locks.
Spinlocks are for short critical sections; they disable interrupts and busy-wait. They must not be held across operations that block.
Sleeplocks are for long critical sections (e.g., disk I/O); the waiting thread sleeps instead of spinning.
Deadlocks arise from cycles in the lock-wait graph. The most reliable prevention is a fixed global lock ordering.
Memory barriers (__sync_synchronize) are essential around lock acquire/release to prevent compilers and CPUs from reordering stores out of critical sections.
Multi-core boot uses startothers() with a flag polled via xchg to safely bring up each AP in sequence.
Lock contention is the primary scalability bottleneck in a kernel with many CPUs; fine-grained locking and per-CPU data structures mitigate it.
RW locks improve throughput for read-heavy workloads; seqlocks eliminate writer starvation at the cost of reader retries.

Kernel Synchronization: Spinlocks, Sleeplocks, Deadlocks, and Multi-core Boot

Why This Matters

1. Atomic Instructions — The Hardware Foundation

2. xv6 Spinlock

Structure

Acquire

Release

Usage Pattern

When to Use a Lock — Checklist

3. xv6 Sleeplock

Structure

Acquire

Release

Usage (inode locking in `fs.c`)

Spinlock vs. Sleeplock

4. Deadlocks

Self-Deadlock

Circular Deadlock (Two Locks, Two Threads)

Prevention Strategies

5. Memory Ordering and Barriers

6. Multi-core Boot in xv6

Roles

Boot Sequence

Scalability Concern

7. Read/Write Locks

xv6-style Implementation

Seqlock (Brief)

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

Kernel Synchronization: Spinlocks, Sleeplocks, Deadlocks, and Multi-core Boot

Why This Matters

1. Atomic Instructions — The Hardware Foundation

2. xv6 Spinlock

Structure

Acquire

Release

Usage Pattern

When to Use a Lock — Checklist

3. xv6 Sleeplock

Structure

Acquire

Release

Usage (inode locking in fs.c)

Spinlock vs. Sleeplock

4. Deadlocks

Self-Deadlock

Circular Deadlock (Two Locks, Two Threads)

Prevention Strategies

5. Memory Ordering and Barriers

6. Multi-core Boot in xv6

Roles

Boot Sequence

Scalability Concern

7. Read/Write Locks

xv6-style Implementation

Seqlock (Brief)

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

Usage (inode locking in `fs.c`)