Atomic Instructions and xv6 Locks

Why This Matters

Modern operating systems run multiple threads simultaneously. Whenever two threads touch the same data without coordination, the result can be wrong in subtle, hard-to-reproduce ways. This module builds the complete picture: why ordinary C statements aren't safe to share, what the CPU gives us to fix that, and how xv6 turns those CPU primitives into the spinlock you will use throughout the rest of the course.


The Race Condition Problem

Consider the simplest possible shared variable:

int i = 7;

void foo(void) {
    i++;
}

If two threads call foo() concurrently, intuition says i should end up as 9. But it might end up as 8. Here's why.

A single C statement compiles to multiple instructions

i++ looks like one operation, but the compiler translates it into (at least) three machine instructions:

LOAD   reg, [i]   // read i from memory into a register
ADD    reg, 1     // increment the register
STORE  [i], reg   // write the result back to memory

The CPU can switch contextβ€”or, on a multicore machine, a second CPU can runβ€”between any of those three steps.

The race in pictures

Thread A Thread B Value of i in memory
LOAD reg_A ← 7 7
LOAD reg_B ← 7 7
ADD reg_A = 8 7
ADD reg_B = 8 7
STORE i ← 8 8
STORE i ← 8 8 ← wrong!

Both threads read the value 7 before either has written 8 back. Both write 8. One increment is silently lost. This is called a race condition or data race.


The Solution: Atomic Instructions

An atomic operation is one that the CPU guarantees to complete without any possibility of interleaving with other threads. No other thread can observe a half-finished atomic operation.

Operation Atomic?
i++ in C No – compiles to 3 instructions
atomic_inc(&i) (Linux) Yes – single indivisible instruction

The hardware enforces atomicity physically: while one core executes an atomic instruction, the memory bus is locked so no other core can perform a conflicting access.


xchg(): The Key Primitive in xv6

xv6 builds its entire lock system on one atomic instruction: xchg (exchange). The function in x86.h is:

static inline uint xchg(volatile uint *addr, uint newval)
{
    uint result;
    asm volatile("lock; xchgl %0, %1" :
        "+m" (*addr), "=a" (result) :
        "1" (newval) :
        "cc");
    return result;
}

What it does

xchg(addr, newval) atomically:

  1. Reads the current value at *addr into result.
  2. Writes newval into *addr.
  3. Returns the old value (result).

Example:

Before: *addr = 0, newval = 1
After:  *addr = 1, result (returned) = 0

The lock; prefix on the x86 instruction is what tells the CPU to assert the bus lock, making the swap indivisible. No other core can read or write *addr between the read and the write.


xv6 Spinlock: acquire and release

A spinlock uses xchg to implement mutual exclusion. The lock is just a struct with a single integer field locked (0 = free, 1 = held).

acquire

void acquire(struct spinlock *lk)
{
    pushcli();              // disable interrupts on this CPU
    if (holding(lk)) panic("acquire");

    // Spin until we can set lk->locked from 0 to 1.
    while (xchg(&lk->locked, 1) != 0)
        ;

    __sync_synchronize();   // memory barrier
}

Step by step:

  1. pushcli() – Disables interrupts on the current CPU. This prevents a situation where the CPU holds the lock, then gets interrupted, and the interrupt handler also tries to acquire the same lock (deadlock).
  2. holding() check – Panics if this CPU already holds the lock (avoids self-deadlock in debug builds).
  3. Spin loop – Calls xchg(&lk->locked, 1) repeatedly. If the lock was already 1 (held by someone else), xchg returns 1 and we keep spinning. The moment the lock is released (set to 0), our xchg will atomically set it to 1 and return 0, breaking out of the loop.
  4. __sync_synchronize() – A compiler/hardware memory barrier. It ensures that no memory read or write that occurs after this point in the code is reordered by the CPU or compiler to happen before we acquired the lock.

release

void release(struct spinlock *lk)
{
    __sync_synchronize();   // memory barrier

    asm volatile("movl $0, %0" : "+m" (lk->locked) : );
    popcli();               // re-enable interrupts
}

Step by step:

  1. __sync_synchronize() – Ensures all writes protected by the lock are visible to other CPUs before the lock is released.
  2. movl $0 – Writes 0 to lk->locked, releasing the lock. This is done with inline assembly (rather than lk->locked = 0) to prevent the compiler from reordering it past the barrier.
  3. popcli() – Restores the interrupt-enable state that was in effect before the matching pushcli().

Why Interrupts Must Be Disabled

Disabling interrupts during lock hold prevents a specific deadlock scenario:

  1. Thread A acquires lock L on CPU 0.
  2. A timer interrupt fires on CPU 0.
  3. The interrupt handler tries to acquire lock L.
  4. Deadlock: the handler can never get L because A holds it, and A can never run to release L because the handler never returns.

On a uniprocessor, this is the only form of concurrency. On a multiprocessor, disabling interrupts is still necessary to handle this local-CPU scenario, but the xchg spin is still needed to exclude other CPUs.


Key Takeaways

Practice

  1. Why can two threads executing i++ concurrently produce an incorrect result even on a single-core machine?
  2. What does xchg(&lk->locked, 1) return when the lock is already held (i.e., lk->locked == 1)?
  3. In xv6's acquire, why is pushcli() called before the spin loop rather than after it?
  4. What is the role of __sync_synchronize() in xv6's acquire and release?
  5. Suppose you remove pushcli()/popcli() from xv6's spinlock implementation but keep xchg. On a multi-core machine, which of the following statements is TRUE?
  6. In your own words, explain what makes the x86 lock; xchgl instruction atomic.
  7. Walk through the exact sequence of events when two CPUs (A and B) both call acquire on an unlocked spinlock simultaneously. Which one wins, and what does the other do?
  8. Suppose a developer rewrites xv6's release as lk->locked = 0; in plain C instead of using inline assembly. What could go wrong, and why?