Virtual Memory 2: x86-64 Paging and xv6 VM
Why this matters
The previous module introduced 32-bit paging as a concept. Real systems β including xv6 on modern hardware β run in 64-bit mode, where the address space is so large that a new paging structure is required. This module shows exactly what changed, why, and how xv6's kernel code wires it all up. Understanding this material lets you read, debug, and extend real OS code rather than just describe it abstractly.
1. Quick Recap: Why Not One Big Array?
Virtual addresses are divided into 4 KB "pages" (the lower 12 bits are the page offset). A naΓ―ve approach stores all page table entries (PTEs) in a flat array:
GET_PTE(va) = &ptes[va >> 12]
For 32-bit addresses with 4 KB pages, the index is the upper 20 bits β 2Β²β° = 1 million entries Γ 4 bytes = 4 MB per process. That memory is wasted for every process, even if it uses only a few pages. The solution is a multi-level page table so that unused branches of the tree simply don't exist.
In 32-bit x86, two levels (Page Directory β Page Table) solved the problem. In 64-bit x86, the address space exploded, so Intel added more levels.
2. x86-64 Canonical Addresses (48-bit)
A 64-bit register can hold 2βΆβ΄ addresses, but current CPUs only implement 48 bits of virtual address space (future CPUs may extend to 56 bits). The hardware enforces a canonical form: bits 63β48 must all be copies of bit 47 (sign extension). Any address that violates this causes a General Protection Fault.
This splits the address space into two halves:
| Region | Range | Size |
|---|---|---|
| User space | 0x0000_0000_0000_0000 β 0x0000_7FFF_FFFF_FFFF |
128 TB |
| Kernel space | 0xFFFF_8000_0000_0000 β 0xFFFF_FFFF_FFFF_FFFF |
~128 TB |
xv6 sets KERNBASE = 0xFFFF8000_00000000, so the kernel lives in the top canonical half.
3. x86-64 4-Level Paging
To cover 48-bit addresses using 4 KB pages (12-bit offset), the remaining 36 bits are split across four 9-bit indexes:
Bits 47β39 β PML4 index (Page Map Level 4)
Bits 38β30 β PDPT index (Page Directory Pointer Table)
Bits 29β21 β PD index (Page Directory)
Bits 20β12 β PT index (Page Table)
Bits 11β0 β Page offset
CR3 β PML4 β PDPT β PD β PT β Physical Page
Each table has 2βΉ = 512 entries, each 8 bytes wide (one 64-bit word). A single level occupies exactly one 4 KB page.
Why 9 bits per level? 512 entries Γ 8 bytes = 4096 bytes = one page. Each table fits neatly in a page frame.
4. PTE Flag Bits
Every entry in every level of the hierarchy shares the same flag layout in the lower 12 bits (and bit 63):
| Bit | Name | Meaning when set |
|---|---|---|
| 0 | P | Entry is present (valid); hardware faults if 0 |
| 1 | W | Page is writable; read-only if 0 |
| 2 | U | User-accessible; kernel-only if 0 |
| 3 | PWT | Write-through caching (vs. write-back default) |
| 4 | PCD | Cache disabled β used for MMIO/DMA regions |
| 5 | A | Accessed β set by hardware on any read/write |
| 6 | D | Dirty β set by hardware on write; used to decide if a page must be written back before eviction |
| 7 | PS | Page Size β in PDPT/PD entries, maps a 1 GB or 2 MB "huge page" directly instead of pointing to another table level |
| 63 | XD/NX | Execute-disable β page cannot be executed |
Bits 51β12 hold the physical page frame number (the physical address with its lower 12 bits zeroed).
Reading a PTE value β worked example
Given PTE = 0x0000_0001_23AB_9067:
- Physical address: mask off the lower 12 bits β
0x0000_0001_23AB_9000 - Flags (lower 12 bits =
0x067=0b0110_0111):- Bit 0 (P) = 1 β present
- Bit 1 (W) = 1 β writable
- Bit 2 (U) = 1 β user-accessible
- Bit 5 (A) = 1 β accessed
- Bit 6 (D) = 1 β dirty
The Accessed bit is set by the hardware, not the OS. The OS reads it to implement page-replacement policies (e.g., clock/LRU approximation). The Dirty bit tells the OS whether a page that is being evicted must be written back to disk (only dirty pages need the write; clean pages can simply be discarded).
5. xv6 Kernel Page Table Setup
xv6's kvmalloc() builds the kernel's page table early in boot:
void kvmalloc(void) {
kpml4 = (pml4e_t*) kalloc(); // allocate the PML4 table
kpdpt = (pde_t*) kalloc(); // allocate a PDPT
// Point PML4[KERNBASE index] β kpdpt (converting virtual β physical)
kpml4[PMX(KERNBASE)] = v2p(kpdpt) | PTE_P | PTE_W;
// Direct-map first 1 GB of physical memory to KERNBASE (huge page)
kpdpt[0] = 0 | PTE_PS | PTE_P | PTE_W;
// Map 0xC000_0000 physical (PCI/device space) uncached
kpdpt[3] = 0xC0000000 | PTE_PS | PTE_P | PTE_W | PTE_PWT | PTE_PCD;
lcr3(v2p(kpml4)); // activate the new page table
}
Key observations:
kpml4is the root (PML4). The single entry at indexPMX(KERNBASE)covers the entire top-half kernel range.kpdptis the second level (PDPT). Each entry withPTE_PSset creates a 1 GB huge page β the hardware stops the walk here and uses the physical address directly (no PD or PT needed).PTE_PWT | PTE_PCDonkpdpt[3]disables caching for MMIO/device memory β writes go straight to hardware registers.lcr3(v2p(kpml4))loads CR3 with the physical address of the new PML4, activating the mapping and flushing the TLB.
If PTE_PS were removed from kpdpt[0], the hardware would interpret the PDPT entry as pointing to a Page Directory. Since no PD was allocated, the address would be garbage β a page fault during early boot before the kernel has any fault handler.
6. xv6 User Space Setup
After the kernel is running, userinit() creates the very first user process, which runs initcode.S (a tiny program that calls exec("/init")). The VM setup:
inituvm(p->pgdir, _binary_initcode_start, _binary_initcode_size);
β mappages(pgdir, (void*)PGSIZE, PGSIZE, V2P(mem), PTE_W | PTE_U);
PTE_U is critical: without it the user process cannot access the page and would fault on its first instruction.
7. Page Table Walk: walkpgdir
walkpgdir is the core helper that traverses (or creates) the page table tree to find the PTE for a given virtual address:
pte_t* walkpgdir(pde_t *pml4, const void *va, int alloc) {
pml4e = &pml4[PMX(va)];
if (*pml4e & PTE_P)
pdp = (pdpe_t*)P2V(PTE_ADDR(*pml4e)); // table exists
else {
if (!alloc || (pdp = (pdpe_t*)kalloc()) == 0) return 0;
memset(pdp, 0, PGSIZE);
*pml4e = V2P(pdp) | PTE_P | PTE_W | PTE_U;
}
// β¦ same pattern repeated for PD and PT β¦
}
PMX(va)extracts the PML4 index from the virtual address.PTE_ADDR(*pml4e)masks off the flag bits to get the physical frame address.P2Vconverts a physical address to a kernel virtual address so the CPU can dereference the pointer.- If
alloc == 1and a level is missing, a new page is allocated and zeroed, then linked in.
mappages: installing PTEs
int mappages(pde_t *pgdir, void *va, addr_t size, addr_t pa, int perm) {
a = PGROUNDDOWN(va);
last = PGROUNDDOWN(va + size - 1);
for (;;) {
pte = walkpgdir(pgdir, a, 1); // find/create leaf PTE
*pte = pa | perm | PTE_P; // install physical address + flags
if (a == last) break;
a += PGSIZE;
pa += PGSIZE;
}
}
This iterates over the virtual address range page by page, calling walkpgdir with alloc=1 to create intermediate tables as needed, then writing the final PTE.
Key Takeaways
- x86-64 uses 48-bit canonical addresses, splitting the space into a user half and a kernel half separated by a huge "hole" of non-canonical addresses.
- 4-level paging (PML4 β PDPT β PD β PT) handles the 36 index bits above the 12-bit page offset, with 512 entries per level.
- PTE flag bits encode permissions (P, W, U), caching policy (PWT, PCD), hardware-tracked state (A, D), and huge-page signaling (PS).
- xv6's
kvmallocuses 1 GB huge pages (PTE_PS in PDPT) to direct-map physical memory into the kernel with minimal table overhead. walkpgdir+mappagesare the two primitives that create and populate any page table; understanding them unlocks all of xv6's VM code.