Virtual Memory 2: x86-64 Paging and xv6 VM

Why this matters

The previous module introduced 32-bit paging as a concept. Real systems β€” including xv6 on modern hardware β€” run in 64-bit mode, where the address space is so large that a new paging structure is required. This module shows exactly what changed, why, and how xv6's kernel code wires it all up. Understanding this material lets you read, debug, and extend real OS code rather than just describe it abstractly.


1. Quick Recap: Why Not One Big Array?

Virtual addresses are divided into 4 KB "pages" (the lower 12 bits are the page offset). A naΓ―ve approach stores all page table entries (PTEs) in a flat array:

GET_PTE(va) = &ptes[va >> 12]

For 32-bit addresses with 4 KB pages, the index is the upper 20 bits β†’ 2²⁰ = 1 million entries Γ— 4 bytes = 4 MB per process. That memory is wasted for every process, even if it uses only a few pages. The solution is a multi-level page table so that unused branches of the tree simply don't exist.

In 32-bit x86, two levels (Page Directory β†’ Page Table) solved the problem. In 64-bit x86, the address space exploded, so Intel added more levels.


2. x86-64 Canonical Addresses (48-bit)

A 64-bit register can hold 2⁢⁴ addresses, but current CPUs only implement 48 bits of virtual address space (future CPUs may extend to 56 bits). The hardware enforces a canonical form: bits 63–48 must all be copies of bit 47 (sign extension). Any address that violates this causes a General Protection Fault.

This splits the address space into two halves:

Region Range Size
User space 0x0000_0000_0000_0000 – 0x0000_7FFF_FFFF_FFFF 128 TB
Kernel space 0xFFFF_8000_0000_0000 – 0xFFFF_FFFF_FFFF_FFFF ~128 TB

xv6 sets KERNBASE = 0xFFFF8000_00000000, so the kernel lives in the top canonical half.


3. x86-64 4-Level Paging

To cover 48-bit addresses using 4 KB pages (12-bit offset), the remaining 36 bits are split across four 9-bit indexes:

Bits 47–39 β†’ PML4 index  (Page Map Level 4)
Bits 38–30 β†’ PDPT index  (Page Directory Pointer Table)
Bits 29–21 β†’ PD index    (Page Directory)
Bits 20–12 β†’ PT index    (Page Table)
Bits 11–0  β†’ Page offset

CR3 β†’ PML4 β†’ PDPT β†’ PD β†’ PT β†’ Physical Page

Each table has 2⁹ = 512 entries, each 8 bytes wide (one 64-bit word). A single level occupies exactly one 4 KB page.

Why 9 bits per level? 512 entries Γ— 8 bytes = 4096 bytes = one page. Each table fits neatly in a page frame.


4. PTE Flag Bits

Every entry in every level of the hierarchy shares the same flag layout in the lower 12 bits (and bit 63):

Bit Name Meaning when set
0 P Entry is present (valid); hardware faults if 0
1 W Page is writable; read-only if 0
2 U User-accessible; kernel-only if 0
3 PWT Write-through caching (vs. write-back default)
4 PCD Cache disabled β€” used for MMIO/DMA regions
5 A Accessed β€” set by hardware on any read/write
6 D Dirty β€” set by hardware on write; used to decide if a page must be written back before eviction
7 PS Page Size β€” in PDPT/PD entries, maps a 1 GB or 2 MB "huge page" directly instead of pointing to another table level
63 XD/NX Execute-disable β€” page cannot be executed

Bits 51–12 hold the physical page frame number (the physical address with its lower 12 bits zeroed).

Reading a PTE value β€” worked example

Given PTE = 0x0000_0001_23AB_9067:

The Accessed bit is set by the hardware, not the OS. The OS reads it to implement page-replacement policies (e.g., clock/LRU approximation). The Dirty bit tells the OS whether a page that is being evicted must be written back to disk (only dirty pages need the write; clean pages can simply be discarded).


5. xv6 Kernel Page Table Setup

xv6's kvmalloc() builds the kernel's page table early in boot:

void kvmalloc(void) {
    kpml4 = (pml4e_t*) kalloc();        // allocate the PML4 table
    kpdpt = (pde_t*)   kalloc();        // allocate a PDPT

    // Point PML4[KERNBASE index] β†’ kpdpt (converting virtual β†’ physical)
    kpml4[PMX(KERNBASE)] = v2p(kpdpt) | PTE_P | PTE_W;

    // Direct-map first 1 GB of physical memory to KERNBASE (huge page)
    kpdpt[0] = 0 | PTE_PS | PTE_P | PTE_W;

    // Map 0xC000_0000 physical (PCI/device space) uncached
    kpdpt[3] = 0xC0000000 | PTE_PS | PTE_P | PTE_W | PTE_PWT | PTE_PCD;

    lcr3(v2p(kpml4));   // activate the new page table
}

Key observations:

If PTE_PS were removed from kpdpt[0], the hardware would interpret the PDPT entry as pointing to a Page Directory. Since no PD was allocated, the address would be garbage β€” a page fault during early boot before the kernel has any fault handler.


6. xv6 User Space Setup

After the kernel is running, userinit() creates the very first user process, which runs initcode.S (a tiny program that calls exec("/init")). The VM setup:

inituvm(p->pgdir, _binary_initcode_start, _binary_initcode_size);
  β†’ mappages(pgdir, (void*)PGSIZE, PGSIZE, V2P(mem), PTE_W | PTE_U);

PTE_U is critical: without it the user process cannot access the page and would fault on its first instruction.


7. Page Table Walk: walkpgdir

walkpgdir is the core helper that traverses (or creates) the page table tree to find the PTE for a given virtual address:

pte_t* walkpgdir(pde_t *pml4, const void *va, int alloc) {
    pml4e = &pml4[PMX(va)];
    if (*pml4e & PTE_P)
        pdp = (pdpe_t*)P2V(PTE_ADDR(*pml4e));   // table exists
    else {
        if (!alloc || (pdp = (pdpe_t*)kalloc()) == 0) return 0;
        memset(pdp, 0, PGSIZE);
        *pml4e = V2P(pdp) | PTE_P | PTE_W | PTE_U;
    }
    // … same pattern repeated for PD and PT …
}

mappages: installing PTEs

int mappages(pde_t *pgdir, void *va, addr_t size, addr_t pa, int perm) {
    a = PGROUNDDOWN(va);
    last = PGROUNDDOWN(va + size - 1);
    for (;;) {
        pte = walkpgdir(pgdir, a, 1);   // find/create leaf PTE
        *pte = pa | perm | PTE_P;       // install physical address + flags
        if (a == last) break;
        a += PGSIZE;
        pa += PGSIZE;
    }
}

This iterates over the virtual address range page by page, calling walkpgdir with alloc=1 to create intermediate tables as needed, then writing the final PTE.


Key Takeaways

  1. x86-64 uses 48-bit canonical addresses, splitting the space into a user half and a kernel half separated by a huge "hole" of non-canonical addresses.
  2. 4-level paging (PML4 β†’ PDPT β†’ PD β†’ PT) handles the 36 index bits above the 12-bit page offset, with 512 entries per level.
  3. PTE flag bits encode permissions (P, W, U), caching policy (PWT, PCD), hardware-tracked state (A, D), and huge-page signaling (PS).
  4. xv6's kvmalloc uses 1 GB huge pages (PTE_PS in PDPT) to direct-map physical memory into the kernel with minimal table overhead.
  5. walkpgdir + mappages are the two primitives that create and populate any page table; understanding them unlocks all of xv6's VM code.

Practice

  1. In x86-64 with 48-bit canonical addressing, what is the total size of the user-accessible virtual address space?
  2. How many entries does each level of the x86-64 4-level page table contain, and why does that number fit exactly in one 4 KB page?
  3. A PTE has the value 0x0000_0001_23AB_9067. Which of the following is the correct physical page address encoded in this entry?
  4. The Accessed (PTE_A) bit in a page table entry is set by the operating system when the page is first read or written.
  5. In xv6's kvmalloc, what is the role of the PTE_PS flag in the line kpdpt[0] = 0 | PTE_PS | PTE_P | PTE_W?
  6. What does the line lcr3(v2p(kpml4)) accomplish in xv6's kvmalloc?
  7. If the PTE_PS flag were removed from kpdpt[0] in xv6's kvmalloc, what would most likely happen?
  8. Why does xv6's inituvm use the PTE_U flag when mapping the first user process's page, and what would happen if it were omitted?
  9. Trace what walkpgdir(pgdir, va, 1) does when the PML4 entry for va is not yet present. List each step in order.
  10. xv6 maps device I/O memory using kpdpt[3] = 0xC0000000 | PTE_PS | PTE_P | PTE_W | PTE_PWT | PTE_PCD. Explain why PTE_PWT and PTE_PCD are both set here, and what could go wrong if they were omitted when mapping device registers.