Introduction to Filesystems

Why This Matters

Every program you write eventually needs to store data that survives a reboot, share it with other processes, or address it by a human-readable name. Without a filesystem, each program would have to manage raw disk blocks on its own — a maintenance nightmare. Understanding filesystems connects the abstract OS concepts you've studied (virtual memory, processes, synchronization) to the concrete question: how does data actually persist?

Why Do We Need Filesystems?

Three core motivations:

Need	What it means
Persistence	Data survives process exit, crashes, and reboots
Naming & organization	Humans (and programs) refer to data by name, not disk address
Sharing	Multiple processes and users can access the same data

Without these guarantees, building almost any real application would be impractical.

What Is a File?

A file is simply a named sequence of bytes. A few important subtleties:

The "name" is some unique identifier (explored below — in Unix this is an inode number).
Files support read, write, and (sometimes) seek operations.
A file is an abstraction — it does not imply any particular backing storage. A file could live on a hard disk, an SSD, in RAM (tmpfs), or even be a virtual device. In this course we focus on disk-backed files.

Block-Oriented vs. Stream-Oriented Files

Property	Block-oriented	Stream-oriented
Unit of access	Fixed-size block	Byte (character)
Random access	Yes — blocks can be addressed in any order	Typically no
Typical example	Disk file	Network socket, mouse input

Disk-based files are block-oriented; the filesystem manages blocks of fixed size (512 bytes in xv6). Stream-oriented files are common for I/O devices and network connections where you process data as it arrives.

Unix File Names: Three Layers

Unix (and xv6) separates identity from human name from open handle:

1. Inodes (Index Nodes)

Each file is assigned a unique inode number. The inode stores the file's metadata and pointers to its data blocks. The inode number is the true, stable identity of a file.

2. Paths

Paths like /usr/bin/gcc are human-convenient names in a hierarchical namespace. The filesystem maps each path component to an inode number — a path entry is called a link (a hard link in Unix terminology). Multiple paths can point to the same inode.

3. File Descriptors

When a process calls open(), the OS:

Resolves the path to an inode.
Creates an open file description (tracking current position and access mode) in a kernel table.
Returns an integer file descriptor to the process.

Subsequent read(), write(), lseek() calls use this integer index. File descriptors are process-local; the open file descriptions they reference can be shared (e.g., after fork()).

Layered Filesystem Design

xv6 implements the filesystem as a stack of layers, each providing a cleaner abstraction to the layer above:

User syscall layer  (sysfile.c — sys_read, sys_write, sys_link, …)
        ↓
  Inode layer       (path resolution, inode read/write)
        ↓
  Logging layer     (crash recovery via write-ahead log)
        ↓
  Buffer cache      (in-memory cache of disk blocks)
        ↓
     Disk driver    (raw block reads/writes)

A quiz question that frequently appears: the virtual memory manager is NOT a layer of the xv6 filesystem. The filesystem does use virtual memory indirectly (buffer cache lives in kernel memory), but it is not a dedicated filesystem layer.

Essential Questions Every Filesystem Must Answer

How do we keep track of disk/file metadata? → Inodes and the superblock
How do we keep track of free space? → Free space bitmap
How do we track which disk blocks belong to a given file? → Block pointers inside the inode
How do we map paths to files? → Directory entries (dirent)

On-Disk Metadata Structures

Superblock

A centralized block near the start of the disk holding global metadata: total number of blocks, number of inode blocks, number of log blocks, etc. When the OS mounts a filesystem it reads the superblock first to understand the layout.

Free Space Bitmap

A compact array of bits — one bit per disk block — indicating whether each block is free or allocated. Scanning or flipping a bit is O(1) per block, making allocation fast.

Inodes

One inode per file, stored in dedicated inode blocks. Each inode contains:

File type (regular file, directory, device, …)
Link count (how many directory entries point here)
File size in bytes
Block pointers — direct, and in larger filesystems indirect pointers — to the data blocks holding the file's content

Directories (dirent)

A directory is itself a file whose content is an array of directory entries. Each entry pairs a name string with an inode number. This is how /usr/bin/gcc is resolved: look up usr in /, then bin in /usr, then gcc in /usr/bin.

Disk as an Array of Blocks

At the hardware level, a disk is just an array of fixed-size blocks (512 bytes in xv6). The filesystem imposes structure on this flat array:

[ boot | superblock | log blocks | inode blocks | bitmap | data blocks … ]

mkfs.c in xv6 creates this layout: it writes the superblock, allocates inode blocks, writes the free space bitmap, and copies any initial files (utilities) into the data region.

Key Takeaways

A filesystem provides persistence, naming, and sharing for byte-sequence files.
Unix files have three levels of naming: inode numbers (identity), paths (human names), and file descriptors (open handles).
Block-oriented files support random access; stream-oriented files are sequential.
The xv6 filesystem is layered: syscall → inode → logging → buffer cache → disk driver. The virtual memory manager is not one of these layers.
On-disk metadata is organized into four structures: superblock, free space bitmap, inodes, and directory entries.
A disk is a flat array of fixed-size blocks; mkfs stamps the filesystem structure onto it before first use.

Introduction to Filesystems

Why This Matters

Why Do We Need Filesystems?

What Is a File?

Block-Oriented vs. Stream-Oriented Files

Unix File Names: Three Layers

1. Inodes (Index Nodes)

2. Paths

3. File Descriptors

Layered Filesystem Design

Essential Questions Every Filesystem Must Answer

On-Disk Metadata Structures

Superblock

Free Space Bitmap

Inodes

Directories (dirent)

Disk as an Array of Blocks

Key Takeaways

Practice

Model answer

Model answer

Model answer

Model answer

Results