Physical Page Management and Interrupts
Why This Matters
Every byte of memory your process uses has to come from somewhere. The kernel is the custodian of physical RAM: it decides which physical pages are free, hands them out on demand, and reclaims them when no longer needed. At the same time, a modern computer is full of slow devicesβdisk drives, network cards, keyboardsβthat take orders of magnitude longer than the CPU to complete an operation. Interrupts are the mechanism that lets the kernel stay productive while those devices work, instead of wasting cycles in a busy-wait loop. Understanding both topics is essential for reasoning about OS performance and correctness.
Part 1: Physical Page Allocation in xv6
The Free List
xv6 tracks free physical pages with a singly-linked free list. Each free page doubles as a list node: the first few bytes of the page store a pointer to the next free page. This works because a free page has no other content to preserve.
Physical RAM (above kernel, up to PHYSTOP)
ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ
β [next]βββΊβ [next]βββΊβ [next]βββΊβ (NULL) β
β page A β page B β page C β page D β
ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ
β²
kmem.freelist
Initialization β kinit1 calls freerange, which iterates from the end of the kernel image up to PHYSTOP in steps of PGSIZE (4 KB), calling kfree on each page to insert it into the list:
kinit1(end, P2V(PHYSTOP));
freerange(void *vstart, void *vend) {
for (; p + PGSIZE <= (char*)vend; p += PGSIZE)
kfree(p);
}
Allocation (kalloc) β pop from the head of the list, O(1):
char* kalloc(void) {
r = kmem.freelist;
if (r)
kmem.freelist = r->next;
return (char*)r; // NULL if no pages left
}
Deallocation (kfree) β push back onto the head of the list, also O(1):
void kfree(char *v) {
r = (struct run*)v;
r->next = kmem.freelist;
kmem.freelist = r;
}
Limitation: Non-Contiguous Allocation
The free list allocates one page at a time. Returned pages are not guaranteed to be physically adjacent. This is perfectly fine for most kernel and user allocations, but some subsystems require a physically contiguous region of multiple pages:
| Use case | Why contiguous pages are needed |
|---|---|
| Huge pages (2 MB, 1 GB) | A single huge-page TLB entry maps a large, contiguous physical range |
| DMA devices | Many older DMA controllers can only address a contiguous physical buffer; the CPU cannot remap their view of memory |
The Buddy System (Linux)
Linux addresses the contiguous-allocation problem with the buddy system. Physical memory is organized into blocks of size 2^k pages. When a block of order k is needed:
- If a free block of order k exists, return it.
- Otherwise, split a free block of order k+1 into two "buddies" of order k; return one, keep the other free.
When a block is freed, the kernel checks whether its buddy is also free. If so, the two are merged back into a block of order k+1 (and this merging recurses upward). This keeps fragmentation low and makes it possible to satisfy large contiguous requests efficiently.
Part 2: Interrupts
The Problem: Devices Are Slow
A CPU core can execute billions of instructions per second, but a mechanical hard disk takes about 10 milliseconds to seek to data and rotate the right sector under the read head. Ten milliseconds is roughly 10 million CPU cyclesβan eternity. The kernel cannot afford to spin-wait while a device works.
There are two strategies for knowing when a device has finished:
| Strategy | Description | Drawback |
|---|---|---|
| Polling | Kernel repeatedly reads a status register | Wastes CPU cycles; latency proportional to poll interval |
| Interrupt | Device signals the CPU when done | CPU is free to do other work; overhead only at completion |
Modern systems use interrupts for slow devices (disk, network) and sometimes polling for very fast devices (high-speed NICs) where interrupt overhead dominates.
How Interrupts Reach the CPU
Interrupts are electrical signals. A device raises its interrupt line, which is routed through the Programmable Interrupt Controller (PIC). The PIC multiplexes signals from many devices onto a single CPU pin (INTR). When the CPU samples that pin between instructions, it suspends the current execution context and invokes the registered interrupt handler.
Modern x86 systems use the Advanced PIC (APIC) architecture:
Devices
β IRQ lines
βΌ
I/O APIC βββββββββββββββββββββββββββΊ Local APIC (CPU 0)
(system chipset) βββ IPI βββββββββββΊ Local APIC (CPU 1)
β
βββ Has a built-in timer
βββ Receives/sends IPIs
- I/O APIC sits in the system chipset (south bridge). It collects IRQ lines from devices and routes them to the appropriate Local APIC.
- Local APIC is inside each processor core. It receives interrupts from the I/O APIC, has its own programmable timer that generates timer interrupts, and can send/receive Inter-Processor Interrupts (IPIs) to other cores.
xv6 initializes the Local APIC in lapicinit(), called from main():
lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));
Interrupt Sources
Not all interrupts come from hardware devices. The x86 architecture categorizes them as:
| Source | Examples |
|---|---|
| Internal faults (exceptions) | Divide-by-zero (#DE), page fault (#PF), general protection fault (#GP) |
| User software | int n instruction (used for system calls) |
| Hardware | Timer tick, keyboard, disk completion |
x86 Interrupt Vectors
Every interrupt type is assigned a small integer called a vector (0β255). When an interrupt occurs, the processor indexes into the Interrupt Descriptor Table (IDT) using the vector number to find the handler address.
Intel reserves vectors 0β31 for processor-defined exceptions:
| Vector | Exception |
|---|---|
| 0 | Divide error |
| 13 | General protection fault |
| 14 | Page fault |
Vectors 32β255 are available for OS/device use. xv6 maps them as:
| Vector | Handler |
|---|---|
| 32 | Timer (clock β scheduler) |
| 33 | Keyboard |
| 46 | IDE disk |
xv6 Interrupt Initialization
The full sequence from main():
main()
βββ lapicinit() β enable Local APIC, set spurious-interrupt vector
βββ tvinit() β fill the IDT
β βββ for n = 0..255:
β mkgate(idt, n, vectors[n], 0)
β (vectors[] is generated assembly that pushes the vector number
β and jumps to the common alltraps handler)
βββ mpmain()
βββ idtinit() β load the IDT register
βββ lidt((void*)idt, PGSIZE)
After idtinit(), the processor knows where to find the handler for every vector. When the timer fires (vector 32), the kernel's clock handler runs and eventually calls the scheduler, allowing another process to runβthis is the heartbeat of preemptive multitasking.
Key Takeaways
- xv6 uses a singly-linked free list of 4 KB pages;
kallocandkfreeare O(1) stack operations. - The free list can only hand out individual, potentially non-contiguous pages. For huge pages or DMA, physically contiguous blocks are required; Linux solves this with the buddy system.
- Devices are orders of magnitude slower than the CPU. Interrupts let the CPU do other work and respond only when a device signals completion.
- On x86, the I/O APIC routes device IRQs to each core's Local APIC; the Local APIC also provides the per-core timer.
- Each interrupt type has a vector number (0β255). The processor uses this to index the IDT and find the right handler.
- xv6 sets up the IDT in
tvinit()and loads it withlidt()insideidtinit().