Page Cache and Page Fault

Why This Matters

Disk I/O is orders of magnitude slower than memory access — a disk seek takes milliseconds while a RAM read takes nanoseconds. Without caching, every read() or write() would stall waiting for physical storage. The Linux page cache is the kernel's primary mechanism for bridging this gap, and understanding it is essential for diagnosing performance problems, reasoning about memory pressure, and understanding how page faults connect virtual memory to real storage.

The Page Cache

The page cache stores physical pages in RAM whose content comes from disk (the backing store). It works for:

Regular files
Memory-mapped files
Block device files

The cache grows dynamically to consume memory that is otherwise idle, and shrinks when processes or the kernel need more RAM. This is why free often shows most RAM as "cached" on a busy Linux system — that is intentional, not waste.

Buffered I/O Flow

Every read() and write() that does not use O_DIRECT goes through the page cache:

Situation	What Happens
Cache hit	Data is already in a page cache page; copy directly to/from user memory
Cache miss	VFS asks the concrete filesystem (e.g., ext4) to read the block from disk, populate the page cache, then serve the request

Write Caching Policies

Three strategies exist for handling writes:

Policy	Behavior	Trade-off
No-write	Writes bypass the cache entirely	Cache stays clean; poor write performance
Write-through	Writes update the cache and immediately write to disk	Cache always coherent; high write latency
Write-back	Writes update the cache; disk write is deferred	Best performance; risk of data loss on crash

Linux uses write-back. When a page is written, it is marked dirty (using a tag in the radix tree / xarray). Dirty pages are eventually flushed to disk by the flusher daemon. This absorbs temporal locality in writes: if the same page is written many times in quick succession, only the final version hits the disk.

Cache Eviction

The page cache is smaller than the disk, so pages must eventually be evicted to make room.

Naive LRU

The simplest policy is Least Recently Used (LRU): track the last access time for every page and evict the oldest. LRU works well for repeated-access patterns, but fails for streaming workloads — files that are read once and never again flood the LRU list and push out genuinely hot pages.

The Two-List Strategy

Linux solves this with two lists:

List	Description	Eviction Eligible?
Inactive list	Pages accessed recently but not confirmed hot	Yes
Active list	Pages confirmed hot (accessed more than once)	No

How pages move:

A page accessed for the first time goes onto the inactive list, and its page-table access bit is cleared so future accesses can be detected.
If the page is accessed again while still on the inactive list, it is promoted to the active list.
If the active list grows much larger than the inactive list, pages are demoted from the active list's head back to the inactive list.
Pages are evicted from the tail of the inactive list.

This means one-time-access pages cycle through the inactive list and are evicted without ever polluting the active list.

Linux Page Cache Internals

The kernel represents the page cache of each file with an address_space structure:

file → inode → address_space → xarray of pages
                      ↑
              one or more vm_area_struct (VMAs)

Key relationships:

One address_space per file (per inode)
One file can be mapped into multiple VMAs (different processes, different offsets)
Each VMA has its own page tables pointing into the same physical pages in the address_space

The new kernel uses xarray (previously radix tree) to index pages within an address_space by file offset.

Page Fault Handling

When a process accesses a virtual address that has no valid PTE, the CPU raises a page fault. The kernel entry point is handle_pte_fault (mm/memory.c), which identifies the faulting VMA and dispatches to a fault handler:

Handler	Trigger
`do_anonymous_page`	No PTE, no backing file (heap/stack)
`filemap_fault`	PTE absent but VMA is backed by a file
`do_wp_page`	PTE is read-only but VMA is writable → Copy-on-Write
`do_swap_page`	Page was swapped out

File-Mapped Page Fault (`filemap_fault`)

Occurs when a PTE entry is missing (---) but the VMA is accessible and has an associated file (vm_file):

Look up the file's address_space for the faulting page offset.
Cache hit: map the existing page cache page into the PTE.
Cache miss: call mapping->a_ops->readpage(file, page) to load it from disk, then map it.

This is the mechanism that makes mmap() work: pages appear on demand as you touch them.

Copy-on-Write (`do_wp_page`)

Triggered when a PTE is marked read-only but the VMA allows writes. This mismatch means CoW is in effect (e.g., after fork()):

Allocate a new physical page.
Copy the content of the original page into the new page.
Update the PTE to point to the new page.
Flush the TLB entry for the address.

The original page is unaffected; the child (or parent) now has an independent copy.

Flusher Daemon

Write-back means RAM can diverge from disk. The flusher daemon (multiple threads) is responsible for syncing dirty pages back to disk.

Writeback is triggered by three conditions:

Trigger	Details
Memory pressure	Free memory drops below a threshold → `wakeup_flusher_threads()` is called; threads write until memory recovers
Age threshold	Dirty data that has not been written after a configurable interval is flushed
Explicit sync	A process calls `sync()` or `fsync()`

The threshold that triggers background writeback is tunable via:

/proc/sys/vm/dirty_background_ratio

This is the percentage of total memory that may be dirty before flusher threads wake up.

Key Takeaways

The page cache is Linux's unified disk buffer, using write-back policy for best throughput.
Two-list (active/inactive) eviction solves LRU's weakness against one-time access patterns by only promoting pages that are accessed more than once.
Every file's page cache is represented by an address_space, which can be shared across multiple VMAs.
Page faults are the mechanism by which virtual addresses gain physical backing — different handlers cover anonymous, file-backed, CoW, and swap scenarios.
Flusher threads periodically drain dirty pages to disk based on memory pressure, age, or explicit sync calls.

Page Cache and Page Fault

Why This Matters

The Page Cache

Buffered I/O Flow

Write Caching Policies

Cache Eviction

Naive LRU

The Two-List Strategy

Linux Page Cache Internals

Page Fault Handling

File-Mapped Page Fault (`filemap_fault`)

Copy-on-Write (`do_wp_page`)

Flusher Daemon

Key Takeaways

Practice

Model answer

Model answer

Results

Page Cache and Page Fault

Why This Matters

The Page Cache

Buffered I/O Flow

Write Caching Policies

Cache Eviction

Naive LRU

The Two-List Strategy

Linux Page Cache Internals

Page Fault Handling

File-Mapped Page Fault (filemap_fault)

Copy-on-Write (do_wp_page)

Flusher Daemon

Key Takeaways

Practice

Model answer

Model answer

Results

File-Mapped Page Fault (`filemap_fault`)

Copy-on-Write (`do_wp_page`)