Atomic Instructions and xv6 Locks
Why This Matters
Modern operating systems run multiple threads simultaneously. Whenever two threads touch the same data without coordination, the result can be wrong in subtle, hard-to-reproduce ways. This module builds the complete picture: why ordinary C statements aren't safe to share, what the CPU gives us to fix that, and how xv6 turns those CPU primitives into the spinlock you will use throughout the rest of the course.
The Race Condition Problem
Consider the simplest possible shared variable:
int i = 7;
void foo(void) {
i++;
}
If two threads call foo() concurrently, intuition says i should end up as 9. But it might end up as 8. Here's why.
A single C statement compiles to multiple instructions
i++ looks like one operation, but the compiler translates it into (at least) three machine instructions:
LOAD reg, [i] // read i from memory into a register
ADD reg, 1 // increment the register
STORE [i], reg // write the result back to memory
The CPU can switch contextβor, on a multicore machine, a second CPU can runβbetween any of those three steps.
The race in pictures
| Thread A | Thread B | Value of i in memory |
|---|---|---|
| LOAD reg_A β 7 | 7 | |
| LOAD reg_B β 7 | 7 | |
| ADD reg_A = 8 | 7 | |
| ADD reg_B = 8 | 7 | |
| STORE i β 8 | 8 | |
| STORE i β 8 | 8 β wrong! |
Both threads read the value 7 before either has written 8 back. Both write 8. One increment is silently lost. This is called a race condition or data race.
The Solution: Atomic Instructions
An atomic operation is one that the CPU guarantees to complete without any possibility of interleaving with other threads. No other thread can observe a half-finished atomic operation.
| Operation | Atomic? |
|---|---|
i++ in C |
No β compiles to 3 instructions |
atomic_inc(&i) (Linux) |
Yes β single indivisible instruction |
The hardware enforces atomicity physically: while one core executes an atomic instruction, the memory bus is locked so no other core can perform a conflicting access.
xchg(): The Key Primitive in xv6
xv6 builds its entire lock system on one atomic instruction: xchg (exchange). The function in x86.h is:
static inline uint xchg(volatile uint *addr, uint newval)
{
uint result;
asm volatile("lock; xchgl %0, %1" :
"+m" (*addr), "=a" (result) :
"1" (newval) :
"cc");
return result;
}
What it does
xchg(addr, newval) atomically:
- Reads the current value at
*addrintoresult. - Writes
newvalinto*addr. - Returns the old value (
result).
Example:
Before: *addr = 0, newval = 1
After: *addr = 1, result (returned) = 0
The lock; prefix on the x86 instruction is what tells the CPU to assert the bus lock, making the swap indivisible. No other core can read or write *addr between the read and the write.
xv6 Spinlock: acquire and release
A spinlock uses xchg to implement mutual exclusion. The lock is just a struct with a single integer field locked (0 = free, 1 = held).
acquire
void acquire(struct spinlock *lk)
{
pushcli(); // disable interrupts on this CPU
if (holding(lk)) panic("acquire");
// Spin until we can set lk->locked from 0 to 1.
while (xchg(&lk->locked, 1) != 0)
;
__sync_synchronize(); // memory barrier
}
Step by step:
pushcli()β Disables interrupts on the current CPU. This prevents a situation where the CPU holds the lock, then gets interrupted, and the interrupt handler also tries to acquire the same lock (deadlock).holding()check β Panics if this CPU already holds the lock (avoids self-deadlock in debug builds).- Spin loop β Calls
xchg(&lk->locked, 1)repeatedly. If the lock was already1(held by someone else),xchgreturns1and we keep spinning. The moment the lock is released (set to0), ourxchgwill atomically set it to1and return0, breaking out of the loop. __sync_synchronize()β A compiler/hardware memory barrier. It ensures that no memory read or write that occurs after this point in the code is reordered by the CPU or compiler to happen before we acquired the lock.
release
void release(struct spinlock *lk)
{
__sync_synchronize(); // memory barrier
asm volatile("movl $0, %0" : "+m" (lk->locked) : );
popcli(); // re-enable interrupts
}
Step by step:
__sync_synchronize()β Ensures all writes protected by the lock are visible to other CPUs before the lock is released.movl $0β Writes0tolk->locked, releasing the lock. This is done with inline assembly (rather thanlk->locked = 0) to prevent the compiler from reordering it past the barrier.popcli()β Restores the interrupt-enable state that was in effect before the matchingpushcli().
Why Interrupts Must Be Disabled
Disabling interrupts during lock hold prevents a specific deadlock scenario:
- Thread A acquires lock L on CPU 0.
- A timer interrupt fires on CPU 0.
- The interrupt handler tries to acquire lock L.
- Deadlock: the handler can never get L because A holds it, and A can never run to release L because the handler never returns.
On a uniprocessor, this is the only form of concurrency. On a multiprocessor, disabling interrupts is still necessary to handle this local-CPU scenario, but the xchg spin is still needed to exclude other CPUs.
Key Takeaways
i++is not atomic: it compiles to load/add/store, and concurrent threads can interleave these steps, causing lost updates.- Atomic instructions (like x86
xchgwith thelockprefix) are indivisible: the CPU ensures no other thread can observe a partial execution. - xv6's
xchg(addr, newval)atomically swaps*addrwithnewvaland returns the old value β the single primitive underpinning xv6 spinlocks. acquirespins callingxchguntil it wins the lock (reads 0), then sets a memory barrier; it also disables interrupts to prevent same-CPU deadlock.releasesets a memory barrier, then writes 0 to the lock word, then re-enables interrupts.- Memory barriers (
__sync_synchronize) prevent the compiler and CPU from reordering loads/stores across the lock boundary, preserving the correctness guarantee the lock is supposed to provide.