Virtual Memory III: Page Table Walk and VM Usage

Why This Matters

The last two modules introduced virtual memory theory and the x86-64 four-level paging structure. This module grounds all of that in real code: the xv6 kernel's vm.c. You'll trace exactly how a kernel page table is built from scratch, how a user process gets its first mapped page, and how the OS traverses four levels of page tables to install or look up a mapping. Understanding this code makes page faults, fork, exec, and memory-mapped I/O intuitive rather than mysteriousβ€”and it's the foundation for the homework assignments on page deduplication and copy-on-write.


1. Building the Kernel Page Table: kvmalloc

When xv6 boots, main() calls kvmalloc() to create a single page table that maps the kernel's virtual address space. Here is a simplified version:

void kvmalloc(void) {
    kpml4 = (pml4e_t*) kalloc();
    kpdpt = (pde_t*)  kalloc();

    kpml4[PMX(KERNBASE)] = v2p(kpdpt) | PTE_P | PTE_W;

    // Direct-map first GB of physical memory to KERNBASE
    kpdpt[0] = 0          | PTE_PS | PTE_P | PTE_W;
    kpdpt[3] = 0xC0000000 | PTE_PS | PTE_P | PTE_W | PTE_PWT | PTE_PCD;

    lcr3(v2p(kpml4));
}

Key points

Line What it does
kalloc() Allocates one 4 KB physical page for the PML4 table, then one for the PDPT
kpml4[PMX(KERNBASE)] = … Plants a PML4 entry pointing to the PDPT; PMX extracts bits [47:39] of the virtual address
PTE_PS Page-Size flag β€” makes the PDPT entry point directly to a 1 GB page instead of a Page Directory; skips two levels of walking
v2p(kpdpt) / V2P(mem) Converts a kernel virtual address to a physical address, because page table entries must contain physical addresses
lcr3(v2p(kpml4)) Writes the physical address of PML4 into CR3, activating the new page table hierarchy and flushing the TLB

Why PTE_PS matters: Without it, the hardware would try to interpret kpdpt[0] as a pointer to a Page Directory, not a 1 GB frame. The kernel would take a page fault on its very first memory access during early boot.


2. Setting Up the First User Process: userinit β†’ inituvm

After the kernel is running, main() calls userinit() to launch the first user process, which eventually execs the shell. Part of that setup is mapping the initial user code into the new process's page table:

// proc.c / vm.c (simplified)
userinit(void) {
    setupkvm();   // copy kernel mappings into new process page table
    inituvm(p->pgdir, _binary_initcode_start,
            (addr_t)_binary_initcode_size);
}

inituvm(...) {
    mappages(pgdir, (void*)PGSIZE, PGSIZE, V2P(mem), PTE_W | PTE_U);
}

mappages iterates over a range of virtual addresses, one page at a time, calling walkpgdir at each step to find (or create) the leaf page-table entry, then writing the physical address + permissions into it.

mappages(pde_t *pgdir, void *va, addr_t size, addr_t pa, int perm) {
    a    = PGROUNDDOWN(va);
    last = PGROUNDDOWN(va + size - 1);
    for (;;) {
        pte = walkpgdir(pgdir, a, 1);   // 1 = allocate missing tables
        *pte = pa | perm | PTE_P;
        if (a == last) break;
        a  += PGSIZE;
        pa += PGSIZE;
    }
}

3. The Page Table Walk: walkpgdir

walkpgdir is the heart of xv6's VM code. It accepts a virtual address and traverses the four-level table hierarchy, allocating intermediate tables on the fly when alloc == 1.

pte_t* walkpgdir(pde_t *pml4, const void *va, int alloc) {
    pml4e = &pml4[PMX(va)];           // index into PML4

    if (*pml4e & PTE_P)
        pdp = (pdpe_t*) P2V(PTE_ADDR(*pml4e));  // table exists
    else {
        if (!alloc || (pdp = kalloc()) == 0)
            return 0;                             // fail or no alloc
        memset(pdp, 0, PGSIZE);
        *pml4e = V2P(pdp) | PTE_P | PTE_W | PTE_U;
    }
    // … same pattern repeated for PDP β†’ PD β†’ PT …
    return &pt[PTX(va)];   // pointer to the leaf PTE
}

Walk summary

Virtual address bits: [63:48 unused][47:39 PML4][38:30 PDP][29:21 PD][20:12 PT][11:0 offset]

CR3 β†’ PML4[PMX(va)] β†’ PDP[PDPX(va)] β†’ PD[PDX(va)] β†’ PT[PTX(va)] β†’ physical frame

The function returns a pointer to the PTE (not the PTE value), so mappages can write the physical address directly into it with *pte = pa | perm | PTE_P.

The alloc flag controls whether missing intermediate tables are created. Passing alloc=0 (used in vmprint) makes the walk read-onlyβ€”it returns NULL rather than allocate.


4. Process Address Space

Each process owns its own page table, giving it a private virtual address space from address 0 up to some size sz. Above KERNBASE, every process maps the same kernel pages (installed by setupkvm). This means:

High VA  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Kernel (KERNBASE+) β”‚  ← shared across all processes
         β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
         β”‚  User stack         β”‚
         │  Heap (sbrk grows→) │
         β”‚  Code / data        β”‚
Low VA   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  address 0

5. Growing a Process: sbrk

sbrk(n) increases (or decreases) a process's data segment by n bytes. Internally:

addr_t sys_sbrk(void) {
    argaddr(0, &n);
    if (growproc(n) < 0) return -1;
    return old_sz;
}

int growproc(int64 n) {
    sz = proc->sz;
    if (n > 0)
        sz = allocuvm(proc->pgdir, sz, sz + n);   // add pages
    else if (n < 0)
        sz = deallocuvm(proc->pgdir, sz, sz + n); // remove pages
    proc->sz = sz;
    switchuvm(proc);   // reload CR3 / flush TLB
    return 0;
}

allocuvm calls mappages (which calls walkpgdir) for each new page, allocating physical memory and inserting PTEs. deallocuvm removes PTEs and frees the physical pages.


6. VM Use Case: Page Deduplication

Because page tables add a level of indirection between virtual and physical addresses, the OS can point multiple virtual pages at the same physical page. This is page deduplication:

Process A VA 0x1000 ─┐
                      β”œβ”€β†’ Physical page 0x5000  (read-only copy of libc)
Process B VA 0x3000 β”€β”˜

Benefits:

The prerequisite is that shared pages must be read-only (or that the OS detects writes and reacts). That reaction is copy-on-write.

Copy-on-Write (CoW)

When a process with a deduplicated (shared) page tries to write:

  1. The write triggers a page fault (the PTE has PTE_W cleared).
  2. The OS allocates a fresh physical page, copies the content, and updates the faulting process's PTE to point to the new writable copy.
  3. The other process's mapping is unchanged.

This is the principle behind efficient fork(): instead of copying all pages immediately, the kernel marks them shared and read-only, deferring copies until absolutely necessary.


Key Takeaways

Practice

  1. What is the primary effect of executing lcr3(v2p(kpml4)) in kvmalloc()?
  2. In xv6's kvmalloc(), the line kpdpt[0] = 0 | PTE_PS | PTE_P | PTE_W; uses the PTE_PS flag. What would happen if PTE_PS were removed?
  3. What is the purpose of walkpgdir() in xv6?
  4. Which statement best describes the kernel page table created by kvmalloc()?
  5. Explain the role of the alloc parameter in walkpgdir(pgdir, va, alloc). Give one example of when you would pass alloc=0 and one when you would pass alloc=1.
  6. Trace what happens inside the xv6 kernel when a user program calls sbrk(4096) to grow its heap by one page. Name each function in the call chain and what it does.
  7. Page deduplication maps multiple virtual pages to the same physical page. In which scenario does it provide the most benefit?
  8. Describe how copy-on-write (CoW) builds on page deduplication to make fork() efficient. What triggers the actual copy, and what does the OS do when that trigger fires?
  9. In mappages(), why does the loop advance both a (virtual address) and pa (physical address) by PGSIZE on each iteration?
  10. A student claims: 'Since every process has a copy of the kernel page-table entries, the kernel's physical pages are duplicated N times for N processes.' Is this correct? Explain.