Directories, VFS, and Write-Ahead Logging

Why This Matters

You know how an inode stores a file's metadata and data. But how does the system find which inode belongs to /home/lkp/Desktop/.file? That is the job of the directory layer. On top of that, Linux must support dozens of different on-disk formats (ext4, FAT32, NTFS, tmpfs…) behind a single open() call β€” that is the job of the Virtual File System (VFS). Finally, a filesystem write rarely touches just one block: creating a file updates an inode, a bitmap, and a directory entry. If power fails halfway through, the disk is corrupt. Write-Ahead Logging (WAL) prevents that.


1. Recap: Superblock and Inode

Before the directory layer, remember the two key metadata structures:

Structure Location Purpose
superblock Block 1 of fs.img Stores counts and start positions of log, inode, bitmap, and data regions
dinode (on-disk inode) struct dinode in fs.h Holds type, nlink, size, and 12 direct + 1 indirect block addresses
inode (in-memory inode) struct inode in file.h Cached version of dinode, plus kernel bookkeeping fields

The type field of a dinode can be T_FILE, T_DIR, or T_DEV. Directories use T_DIR.


2. Directories as Files of struct dirent

In Unix (including xv6), a directory is just a special file whose content is an array of fixed-size directory entries.

// fs.h
struct dirent {
    ushort inum;        // inode number (0 = free slot)
    char   name[DIRSIZ]; // up to 14 characters in xv6
};

Each dirent maps one filename to one inode number. The path /home/lkp/Desktop/.file is resolved by walking these mappings one component at a time:

  1. Look up home in the root directory's dirent list β†’ inode N₁
  2. Look up lkp in inode N₁'s dirent list β†’ inode Nβ‚‚
  3. Continue until .file β†’ final inode

The user-space ls utility reads the directory file with a plain read() and iterates through dirent structs:

while (read(fd, &de, sizeof(de)) == sizeof(de)) {
    // de.inum == 0 means deleted/empty slot, skip it
    printf("%s %d %d %d\n", fmtname(buf), st.type, st.ino, st.size);
}

3. Directory API

xv6 provides two levels of directory helpers:

Low-level: dirlookup

struct inode* dirlookup(struct inode *dp, char *name, uint *poff);

Scans the directory inode dp for an entry whose name matches, returns the inode, and optionally writes the byte offset of the entry into *poff. This is used to find a single component in an already-open directory.

High-level: namei / nameiparent

struct inode* namei(char *path);
struct inode* nameiparent(char *path, char *name);

Both call the internal namex(), which tokenizes path and calls dirlookup repeatedly.

nameiparent is essential for system calls like open(O_CREAT) and unlink, which need both the parent directory (to update) and the final component name.


4. The Unix Virtual File System (VFS)

The Problem

A modern Linux system might have:

Without VFS, every application would need to know which filesystem it is talking to and call different code paths.

The Solution: An Abstraction Layer

VFS inserts a common interface between user space and every concrete filesystem:

User space:  open()  read()  write()  lseek()  ...
                        β”‚
                  β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                  β”‚    VFS     β”‚  ← "top" interface (POSIX syscalls)
                  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                β–Ό                β–Ό
      ext4            FAT32             NFS      ← "bottom" interface

The "top" interface is the POSIX system call API every application uses. The "bottom" interface is a set of function pointers (method table) that each filesystem driver implements.

Key benefits

Benefit Example
Abstraction cp src dst works whether src is on ext4 or FAT32
Coexistence Both filesystems are mounted simultaneously
Cooperation Copying a file across filesystems is transparent

Developing a new Linux filesystem means implementing the bottom VFS interface (a set of file_operations, inode_operations, and super_operations structs), not touching user-space code.


5. The Filesystem Crash Problem

Creating a file like echo hello > test.txt involves multiple disk writes in sequence:

sys_open()
  create()
    ialloc()       β€” allocate an inode (write bitmap)
    iupdate()      β€” write inode metadata      ← power failure here?
    dirlink()      β€” add dirent to directory
    ... write data blocks ...

If the machine loses power after iupdate() but before dirlink(), the disk is inconsistent: an inode is marked allocated but no directory entry points to it. Traditional tools like fsck fix this by scanning the whole disk, but that takes minutes on large drives.


6. Write-Ahead Logging (WAL)

The Core Idea

WAL groups multiple disk writes into a transaction that executes atomically: either all writes happen, or none do.

In xv6 (file.c):

begin_op();
ilock(f->ip);
if ((r = writei(f->ip, addr + i, f->off, n1)) > 0)
    f->off += r;
iunlock(f->ip);
end_op();

Why the log blocks come first

If the machine crashes before the commit record is written, the log is simply ignored on recovery β€” no partial write is visible. If it crashes after the commit record, the recovery code replays the log to completion. This guarantees atomicity.

The NaΓ―ve Approach vs. xv6

A naΓ―ve WAL writes every single disk operation to the log immediately, which is very slow. xv6 delays most disk I/O and batches writes within a transaction, reducing the total number of log writes.


Key Takeaways

  1. A directory is a file containing an array of struct dirent entries, each mapping a filename to an inode number.
  2. dirlookup finds a single named entry in a directory inode; namei/nameiparent walk a full path by repeatedly calling dirlookup.
  3. VFS is an abstraction layer with a "top" POSIX interface toward user space and a "bottom" method-table interface toward concrete filesystem drivers; it lets multiple filesystem types coexist and interoperate transparently.
  4. WAL solves the crash-consistency problem by writing all changes to log blocks first and only making them permanent after a commit record is safely on disk β€” guaranteeing atomicity without an expensive full-disk scan.
  5. xv6's begin_op()/end_op() demarcate a transaction; code inside that pair is crash-safe.

Practice

  1. In xv6, what does a directory file actually contain on disk?
  2. What is the purpose of namei() in xv6?
  3. Which xv6 function would you call when implementing unlink("/a/b/c") to get both the directory that contains c and the name c itself?
  4. A user has an ext4 root filesystem and mounts a FAT32 USB drive. She runs cp /data/report.pdf /mnt/usb/report.pdf. Which component makes this transparent to the cp program?
  5. In the Write-Ahead Log protocol, when is it safe to say a transaction has been committed?
  6. Explain what happens to the filesystem state if the machine loses power during create() in xv6 without Write-Ahead Logging, specifically after iupdate() but before dirlink() completes.
  7. A student argues: "We can skip the log blocks and just write dirty blocks directly to disk at end_op() β€” that's simpler and equally safe." What is wrong with this argument?
  8. In the struct dirent used by xv6, what does an inum value of 0 signify?
  9. Which pair of VFS interfaces is correctly described?
  10. Walk through how xv6 resolves the absolute path /usr/bin/sh step by step, naming the key data structures and functions involved.