Buffer Cache and Write-Ahead Logging

Why This Matters

A single user-visible operation like echo hello > test.txt translates into several separate disk writes inside the kernel: allocate an inode, update it, add a directory entry, write data blocks. Each write touches a different disk location. If power fails between any two of them, the filesystem ends up in a partially-updated state — an orphaned inode, a directory entry with no inode, or corrupted metadata. This module explains how xv6 prevents that using a Write-Ahead Log (WAL).

The Problem: Partial Writes

When create() runs inside sys_open(), it executes roughly this sequence:

ialloc()    — allocate a free inode
iupdate()   — write the inode to disk        ← power failure here?
dirlink()   — add a directory entry

If the machine crashes after iupdate() but before dirlink(), disk holds a live inode that no directory entry points to. The filesystem is now inconsistent. The fix: make the whole sequence atomic — either all writes commit, or none do.

Write-Ahead Logging (WAL)

WAL groups all the disk writes belonging to one logical operation into a transaction. The core guarantee is:

Write a description of what you are about to do before you do it. On recovery, replay any complete description; discard any incomplete one.

In xv6, every filesystem system call wraps its disk work in begin_op() / end_op():

begin_op();
ilock(f->ip);
writei(f->ip, addr + i, f->off, n1);
iunlock(f->ip);
end_op();

Key Data Structures

buf (buf.h)

An in-memory copy of one 512-byte disk block. Fields include the block number, a reference count, flags (including B_DIRTY), and the data array.

bcache (bio.c)

A doubly-linked list of buf structs that acts as an LRU disk-block cache. Before issuing a disk read the kernel always checks bcache first; a hit avoids the I/O entirely.

logheader (log.c)

Tracks which blocks are part of the current uncommitted transaction:

struct logheader {
  int n;               // number of logged blocks
  int block[LOGSIZE];  // disk block numbers of those blocks
};

logheader exists both in memory (fast access) and as a dedicated header block on disk (survives crashes).

How a Write Works: `log_write()`

Instead of writing a dirty buffer directly to its home disk block, xv6 calls log_write(), which:

Sets the B_DIRTY flag on the buffer — this pins it in the cache (it cannot be evicted).
Searches the log header; if the same block is already logged, reuses that slot (absorption).
Increments n in the log header (in memory only at this point).
Leaves the data sitting in the buffer cache — no disk I/O yet.

This defers all expensive disk writes until commit time.

Transaction Lifecycle

`begin_op()`

acquire(&log.lock);
while (1) {
    if (another thread is committing) → sleep
    if (not enough log slots)        → sleep
    log.outstanding++;
    release(&log.lock);
    break;
}

log.outstanding counts how many system calls are currently inside a transaction.

`end_op()`

log.outstanding--;
if (log.outstanding == 0)
    commit();

The last caller to finish triggers the commit.

`commit()` — step by step

Step	What happens	Why
1. `write_log()`	Copy each dirty buffer from the cache to the log area on disk	Persist the data safely before touching home blocks
2. `write_head()`	Write the log header (with `n` and block numbers) to disk	The commit point — once this is on disk the transaction is permanent
3. Install writes	Copy each block from the log area to its home disk block	Apply the changes to their actual locations
4. Clear log	Set `n = 0`, write header to disk	Mark the log as empty; recovery will ignore it

The commit point is step 2. If power fails before write_head(), recovery sees n = 0 (or a stale header with wrong data) and discards the log — as if the operation never happened. If power fails after write_head(), recovery replays all the logged blocks — the operation fully completes.

Group Commit

Because commit() only fires when log.outstanding reaches zero, multiple concurrent system calls can accumulate dirty buffers under a single commit. This is called group commit:

Transaction X:  begin_op → dirty blocks 2, 3 → end_op  (outstanding: 1→0 → commit!)
Transaction Y:  begin_op → dirty blocks 2, 3 → end_op  (outstanding: 2→1 → waits)

When both X and Y overlap, Y's end_op decrements outstanding to 1, not 0, so no commit fires yet. X's end_op gets it to 0 and commits all pending dirty blocks in one pass. This amortizes the disk-write overhead of a commit across many system calls.

Recovery

On boot, the xv6 log layer reads the on-disk log header. If n > 0, a commit was in progress or completed before the power failed. Recovery replays the log: it re-copies all n logged blocks from the log area to their home locations, then clears the log. This is safe because the writes are idempotent — replaying a block that was already installed just overwrites it with the same data.

Key Takeaways

A single filesystem syscall spans multiple disk writes; WAL makes them atomic.
log_write() pins a buffer dirty in the cache; no data reaches its home block until commit().
The commit point is write_head() — once the log header is on disk, the transaction is permanent.
logheader.n is the only thing that matters for recovery: n = 0 → nothing to replay; n > 0 → replay all logged blocks.
Group commit batches multiple concurrent transactions, reducing disk I/O overhead.
The buffer cache (bcache) serves both as a read cache and as the staging area for uncommitted writes.

Buffer Cache and Write-Ahead Logging

Why This Matters

The Problem: Partial Writes

Write-Ahead Logging (WAL)

Key Data Structures

buf (buf.h)

bcache (bio.c)

logheader (log.c)

How a Write Works: `log_write()`

Transaction Lifecycle

`begin_op()`

`end_op()`

`commit()` — step by step

Group Commit

Recovery

Key Takeaways

Practice

Model answer

Model answer

Model answer

Model answer

Results

Buffer Cache and Write-Ahead Logging

Why This Matters

The Problem: Partial Writes

Write-Ahead Logging (WAL)

Key Data Structures

buf (buf.h)

bcache (bio.c)

logheader (log.c)

How a Write Works: log_write()

Transaction Lifecycle

begin_op()

end_op()

commit() — step by step

Group Commit

Recovery

Key Takeaways

Practice

Model answer

Model answer

Model answer

Model answer

Results

How a Write Works: `log_write()`

`begin_op()`

`end_op()`

`commit()` — step by step