xv6 Comprehensive Review

Why This Matters

xv6 is the teaching OS you have been living inside all semester. Every mechanism you touched — fork, exec, page faults, context switches, file writes — passes through code you can read in its entirety. This review ties those mechanisms into a single coherent picture so you can reason about how the pieces interact, not just recall isolated facts.


1. OS Architecture Recap

xv6 uses a monolithic kernel: all OS services (memory management, file system, device drivers) share one kernel address space. This contrasts with a microkernel, where each service lives in its own protected process and communicates via IPC.

Monolithic kernels are faster (no IPC overhead) but have a larger trusted computing base — a bug anywhere in the kernel can compromise the whole system.

Isolation mechanisms


2. Processes and the Process Lifecycle

struct proc

Every process is described by struct proc in proc.c. Key fields:

Field Purpose
pgdir Pointer to the process's page table
kstack Per-process kernel stack (used during syscalls/traps)
state UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE
ofile[] Open file descriptors
cwd Current working directory inode
tf Saved trapframe (registers at last trap)
context Saved context (registers for swtch)
chan Sleep channel (what the process is waiting on)
killed Set to 1 when a signal should terminate the process

Why does kill 2 2 not immediately kill the xv6 shell? kill() only sets p->killed = 1. The shell only checks that flag when it returns from a trap (system call or interrupt). If the shell is blocked in read(), it won't check killed until the next trap fires.

Process creation: fork()

  1. allocproc() — allocate a struct proc, set up the kernel stack with a fake return address pointing at forkret.
  2. Copy the parent's page table and all mapped pages into the child.
  3. Copy the parent's trapframe so the child returns from fork() at the same instruction — but with return value 0 (child) vs. child PID (parent).
  4. Copy open file descriptors and cwd.
  5. Set state to RUNNABLE; the scheduler will pick it up.

First process: userinit() and initcode

The very first user process cannot be created by fork() — there is no parent yet. Instead:

  1. main() calls userinit().
  2. userinit() calls allocproc(), then builds a minimal page table with setupkvm() + inituvm().
  3. The page table is loaded with initcode — a tiny hand-assembled binary (compiled from initcode.S) that is embedded directly in the kernel binary.
  4. initcode.S executes exec("/init", argv) as its very first action, replacing itself with the real init program.
# initcode.S (simplified)
start:
  mov  $init, %rdi      # argv[0] = "/init"
  mov  $argv, %rsi
  mov  $SYS_exec, %rax
  syscall

Zombie processes and resource reclaim


3. Scheduling and Context Switching

Round-robin scheduler

xv6 runs scheduler() in an infinite loop on each CPU core (called from mpmain() at boot). The loop scans ptable for the next RUNNABLE process and switches to it:

scheduler() loop:
  for each proc p in ptable:
    if p->state == RUNNABLE:
      switchuvm(p)        // install p's page table + TSS
      swtch(&cpu->scheduler, p->context)  // jump to p

switchuvm(p) installs the process's page table into CR3 and updates the TSS so that hardware knows where the kernel stack is on the next trap.

swtch() — saving and restoring context

swtch is written in assembly (swtch.S). It saves the callee-saved registers (%rbx, %rbp, %r12%r15, and %rip implicitly via push/ret) to the old context struct and restores them from the new context struct.

These registers are saved because the C calling convention only guarantees that callee-saved registers survive a function call — the scheduler must preserve whatever the kernel was doing before the switch.

swtch is called in three situations:

  1. Voluntary yield — a process calls sleep() or yield(), which calls sched()swtch back to the scheduler.
  2. Timer interrupt — the timer handler calls yield()sched()swtch.
  3. Exitexit() calls sched()swtch to switch away permanently.

CFS fairness (Linux contrast)

Linux's Completely Fair Scheduler tracks vruntime (virtual runtime weighted by priority) for each task. It picks the task with the smallest vruntime. CPU-bound tasks accumulate vruntime fast; I/O-bound tasks (mostly sleeping) accumulate it slowly and are prioritized when they wake up — achieving fairness without starvation.


4. Traps, Interrupts, and Device Drivers

A trap is any CPU-forced transfer of control into the kernel. xv6 handles three kinds:

Path 1 — System calls

User syscall instruction
  → trapasm.S: syscall_entry  (saves trapframe)
  → proc.c / syscall.c: syscall()
  → individual handler: sys_read(), sys_write(), …
  → iret back to user

The syscall entry point address is written into the MSR_LSTAR model-specific register at boot.

Path 2 — Hardware interrupts (e.g., timer, keyboard)

Hardware asserts interrupt line
  → APIC routes to CPU
  → vector.S → trapasm.S: alltraps  (saves trapframe, calls trap())
  → trap.c: trap(struct trapframe *tf)
    • timer: ticks++; wakeup(&ticks); yield()
    • keyboard: kbdintr()
  → iret back to interrupted code

Path 3 — Exceptions (e.g., page fault)

Handled similarly to hardware interrupts via alltrapstrap(). The trap number (T_PGFLT = 14) tells the handler what happened.

In hw3 (Copy-on-Write), you used page faults productively:


5. Kernel Synchronization

Why locks are needed

With multiple CPUs and preemptible kernel code, two threads of execution can interleave reads and writes to shared data structures (e.g., ptable, the file table). Locks enforce mutual exclusion.

xv6 lock types

Lock Use case Behavior while waiting
spinlock Short critical sections, interrupt handlers Spins (busy-wait)
sleeplock Long operations (disk I/O) Calls sleep(), yields CPU

When to use a lock — checklist

Ask: "Is this data structure accessed by more than one CPU or interrupt handler concurrently?"

Deadlock

Deadlock occurs when two threads each hold a lock the other needs. xv6 avoids it by enforcing a global lock ordering (e.g., always acquire ptable.lock before p->lock).

Advanced lock types (Linux)


6. File System

Disk layout

| boot | superblock | log | inodes | free bitmap | data blocks |

Inodes (in-memory struct inode)

An inode is the kernel's handle to a file. iget(dev, inum) looks up or allocates a cached inode. Inodes are reference-counted; iput() drops a reference and, when the count reaches zero, frees the inode.

Directories and path resolution

A directory is just a file whose data blocks hold dirent records:

struct dirent {
  ushort inum;      // inode number (0 = free slot)
  char   name[14];
};

Path resolution (namei) walks the directory tree, calling dirlookup at each component.

Virtual File System (VFS)

VFS is an abstraction layer that makes multiple concrete filesystem types (ext4, FAT, tmpfs, …) look identical to the kernel. Operations like read, write, open dispatch through function pointers in the inode operations table. xv6's design is a simplified version of this concept.

Buffer cache (bio.c)

The buffer cache keeps recently-used disk blocks in memory to avoid slow disk I/O.

Function Role
bget(dev, blockno) Find or allocate a cache slot (LRU replacement)
bread(dev, blockno) bget + read from disk if not cached
bwrite(buf) Write block to disk

Eviction uses LRU (least recently used): the least-recently-accessed buffer is recycled when the cache is full.

Journaling and crash recovery

xv6 uses a write-ahead log (journal) to provide atomic multi-block updates:

  1. begin_op() — start a transaction.
  2. log_write(buf) — mark a buffer as part of the current transaction (dirty, logged).
  3. end_op() — when all operations in the transaction complete, call commit(): a. Write all dirty blocks to the log region on disk. b. Write the log header (commit record). c. Install — copy log blocks to their real disk locations. d. Clear the log header.

If the system crashes between steps (a) and (c), recovery replays the log on the next boot. If it crashes before the commit record is written, the partial transaction is simply ignored — the filesystem remains consistent.


Key Takeaways

  1. struct proc is the central data structure — everything about a process lives here.
  2. userinit + initcode bootstrap the first user process by embedding a tiny binary directly in the kernel.
  3. swtch performs context switching by saving/restoring callee-saved registers; it is called from sched() in yield, sleep, and exit paths.
  4. Three trap paths: system call (via MSR entry), hardware interrupt (via APIC + vector table), and exception (e.g., page fault) — all funnel through alltraps / trap().
  5. Spinlocks are for short, interrupt-safe critical sections; sleeplocks are for long I/O waits.
  6. The filesystem layers build on each other: buffer cache → inode → directory → VFS.
  7. Journaling (write-ahead log) guarantees crash consistency by committing all-or-nothing multi-block updates.

Practice

  1. Which process state describes a process that has called exit() but whose parent has not yet called wait()?
  2. How is the first user process (PID 1) created in xv6? Walk through the key function calls.
  3. What does switchuvm(p) do when the scheduler is about to run process p?
  4. Which registers does swtch() save and restore?
  5. List the three ways execution can trap into the xv6 kernel and name one concrete trigger for each.
  6. In xv6's Copy-on-Write fork (hw3), what happens when a child process writes to a page that was marked read-only at fork time?
  7. Explain why you should use a sleeplock rather than a spinlock when protecting a disk I/O operation.
  8. In xv6's journaling workflow, at what point is it safe to consider a multi-block file-system update durable and crash-consistent?
  9. A process calls fork() and the parent immediately calls exit() before the child has finished. What prevents the child from becoming an orphan zombie that can never be reaped?