Directories, VFS, and Write-Ahead Logging
Why This Matters
You know how an inode stores a file's metadata and data. But how does the system find which inode belongs to /home/lkp/Desktop/.file? That is the job of the directory layer. On top of that, Linux must support dozens of different on-disk formats (ext4, FAT32, NTFS, tmpfsβ¦) behind a single open() call β that is the job of the Virtual File System (VFS). Finally, a filesystem write rarely touches just one block: creating a file updates an inode, a bitmap, and a directory entry. If power fails halfway through, the disk is corrupt. Write-Ahead Logging (WAL) prevents that.
1. Recap: Superblock and Inode
Before the directory layer, remember the two key metadata structures:
| Structure | Location | Purpose |
|---|---|---|
| superblock | Block 1 of fs.img |
Stores counts and start positions of log, inode, bitmap, and data regions |
| dinode (on-disk inode) | struct dinode in fs.h |
Holds type, nlink, size, and 12 direct + 1 indirect block addresses |
| inode (in-memory inode) | struct inode in file.h |
Cached version of dinode, plus kernel bookkeeping fields |
The type field of a dinode can be T_FILE, T_DIR, or T_DEV. Directories use T_DIR.
2. Directories as Files of struct dirent
In Unix (including xv6), a directory is just a special file whose content is an array of fixed-size directory entries.
// fs.h
struct dirent {
ushort inum; // inode number (0 = free slot)
char name[DIRSIZ]; // up to 14 characters in xv6
};
Each dirent maps one filename to one inode number. The path /home/lkp/Desktop/.file is resolved by walking these mappings one component at a time:
- Look up
homein the root directory'sdirentlist β inode Nβ - Look up
lkpin inode Nβ'sdirentlist β inode Nβ - Continue until
.fileβ final inode
The user-space ls utility reads the directory file with a plain read() and iterates through dirent structs:
while (read(fd, &de, sizeof(de)) == sizeof(de)) {
// de.inum == 0 means deleted/empty slot, skip it
printf("%s %d %d %d\n", fmtname(buf), st.type, st.ino, st.size);
}
3. Directory API
xv6 provides two levels of directory helpers:
Low-level: dirlookup
struct inode* dirlookup(struct inode *dp, char *name, uint *poff);
Scans the directory inode dp for an entry whose name matches, returns the inode, and optionally writes the byte offset of the entry into *poff. This is used to find a single component in an already-open directory.
High-level: namei / nameiparent
struct inode* namei(char *path);
struct inode* nameiparent(char *path, char *name);
Both call the internal namex(), which tokenizes path and calls dirlookup repeatedly.
namei("/home/lkp/Desktop/.file")β returns the inode for.filenameiparent("/home/lkp/Desktop/.file", name)β returns the inode forDesktopand copies".file"intoname; used when creating or deleting a file
nameiparent is essential for system calls like open(O_CREAT) and unlink, which need both the parent directory (to update) and the final component name.
4. The Unix Virtual File System (VFS)
The Problem
A modern Linux system might have:
- An SSD root partition formatted as ext4
- A USB drive formatted as FAT32
- A network share via NFS
- An in-memory scratch area as tmpfs
Without VFS, every application would need to know which filesystem it is talking to and call different code paths.
The Solution: An Abstraction Layer
VFS inserts a common interface between user space and every concrete filesystem:
User space: open() read() write() lseek() ...
β
βββββββΌβββββββ
β VFS β β "top" interface (POSIX syscalls)
βββββββ¬βββββββ
ββββββββββββββββββΌβββββββββββββββββ
βΌ βΌ βΌ
ext4 FAT32 NFS β "bottom" interface
The "top" interface is the POSIX system call API every application uses. The "bottom" interface is a set of function pointers (method table) that each filesystem driver implements.
Key benefits
| Benefit | Example |
|---|---|
| Abstraction | cp src dst works whether src is on ext4 or FAT32 |
| Coexistence | Both filesystems are mounted simultaneously |
| Cooperation | Copying a file across filesystems is transparent |
Developing a new Linux filesystem means implementing the bottom VFS interface (a set of file_operations, inode_operations, and super_operations structs), not touching user-space code.
5. The Filesystem Crash Problem
Creating a file like echo hello > test.txt involves multiple disk writes in sequence:
sys_open()
create()
ialloc() β allocate an inode (write bitmap)
iupdate() β write inode metadata β power failure here?
dirlink() β add dirent to directory
... write data blocks ...
If the machine loses power after iupdate() but before dirlink(), the disk is inconsistent: an inode is marked allocated but no directory entry points to it. Traditional tools like fsck fix this by scanning the whole disk, but that takes minutes on large drives.
6. Write-Ahead Logging (WAL)
The Core Idea
WAL groups multiple disk writes into a transaction that executes atomically: either all writes happen, or none do.
In xv6 (file.c):
begin_op();
ilock(f->ip);
if ((r = writei(f->ip, addr + i, f->off, n1)) > 0)
f->off += r;
iunlock(f->ip);
end_op();
begin_op()β starts a transaction; subsequent writes go to the log blocks, not directly to their final disk locationend_op()β commits the transaction: first flushes all log blocks to disk, then writes a commit record, then copies blocks from the log to their real locations
Why the log blocks come first
If the machine crashes before the commit record is written, the log is simply ignored on recovery β no partial write is visible. If it crashes after the commit record, the recovery code replays the log to completion. This guarantees atomicity.
The NaΓ―ve Approach vs. xv6
A naΓ―ve WAL writes every single disk operation to the log immediately, which is very slow. xv6 delays most disk I/O and batches writes within a transaction, reducing the total number of log writes.
Key Takeaways
- A directory is a file containing an array of
struct dirententries, each mapping a filename to an inode number. dirlookupfinds a single named entry in a directory inode;namei/nameiparentwalk a full path by repeatedly callingdirlookup.- VFS is an abstraction layer with a "top" POSIX interface toward user space and a "bottom" method-table interface toward concrete filesystem drivers; it lets multiple filesystem types coexist and interoperate transparently.
- WAL solves the crash-consistency problem by writing all changes to log blocks first and only making them permanent after a commit record is safely on disk β guaranteeing atomicity without an expensive full-disk scan.
- xv6's
begin_op()/end_op()demarcate a transaction; code inside that pair is crash-safe.