Physical Page Management and Interrupts

Why This Matters

Every byte of memory your process uses has to come from somewhere. The kernel is the custodian of physical RAM: it decides which physical pages are free, hands them out on demand, and reclaims them when no longer needed. At the same time, a modern computer is full of slow devicesβ€”disk drives, network cards, keyboardsβ€”that take orders of magnitude longer than the CPU to complete an operation. Interrupts are the mechanism that lets the kernel stay productive while those devices work, instead of wasting cycles in a busy-wait loop. Understanding both topics is essential for reasoning about OS performance and correctness.


Part 1: Physical Page Allocation in xv6

The Free List

xv6 tracks free physical pages with a singly-linked free list. Each free page doubles as a list node: the first few bytes of the page store a pointer to the next free page. This works because a free page has no other content to preserve.

Physical RAM (above kernel, up to PHYSTOP)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ [next]──►│ [next]──►│ [next]──►│  (NULL)  β”‚
β”‚  page A  β”‚  page B  β”‚  page C  β”‚  page D  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β–²
kmem.freelist

Initialization β€” kinit1 calls freerange, which iterates from the end of the kernel image up to PHYSTOP in steps of PGSIZE (4 KB), calling kfree on each page to insert it into the list:

kinit1(end, P2V(PHYSTOP));

freerange(void *vstart, void *vend) {
    for (; p + PGSIZE <= (char*)vend; p += PGSIZE)
        kfree(p);
}

Allocation (kalloc) β€” pop from the head of the list, O(1):

char* kalloc(void) {
    r = kmem.freelist;
    if (r)
        kmem.freelist = r->next;
    return (char*)r;   // NULL if no pages left
}

Deallocation (kfree) β€” push back onto the head of the list, also O(1):

void kfree(char *v) {
    r = (struct run*)v;
    r->next = kmem.freelist;
    kmem.freelist = r;
}

Limitation: Non-Contiguous Allocation

The free list allocates one page at a time. Returned pages are not guaranteed to be physically adjacent. This is perfectly fine for most kernel and user allocations, but some subsystems require a physically contiguous region of multiple pages:

Use case Why contiguous pages are needed
Huge pages (2 MB, 1 GB) A single huge-page TLB entry maps a large, contiguous physical range
DMA devices Many older DMA controllers can only address a contiguous physical buffer; the CPU cannot remap their view of memory

The Buddy System (Linux)

Linux addresses the contiguous-allocation problem with the buddy system. Physical memory is organized into blocks of size 2^k pages. When a block of order k is needed:

  1. If a free block of order k exists, return it.
  2. Otherwise, split a free block of order k+1 into two "buddies" of order k; return one, keep the other free.

When a block is freed, the kernel checks whether its buddy is also free. If so, the two are merged back into a block of order k+1 (and this merging recurses upward). This keeps fragmentation low and makes it possible to satisfy large contiguous requests efficiently.


Part 2: Interrupts

The Problem: Devices Are Slow

A CPU core can execute billions of instructions per second, but a mechanical hard disk takes about 10 milliseconds to seek to data and rotate the right sector under the read head. Ten milliseconds is roughly 10 million CPU cyclesβ€”an eternity. The kernel cannot afford to spin-wait while a device works.

There are two strategies for knowing when a device has finished:

Strategy Description Drawback
Polling Kernel repeatedly reads a status register Wastes CPU cycles; latency proportional to poll interval
Interrupt Device signals the CPU when done CPU is free to do other work; overhead only at completion

Modern systems use interrupts for slow devices (disk, network) and sometimes polling for very fast devices (high-speed NICs) where interrupt overhead dominates.

How Interrupts Reach the CPU

Interrupts are electrical signals. A device raises its interrupt line, which is routed through the Programmable Interrupt Controller (PIC). The PIC multiplexes signals from many devices onto a single CPU pin (INTR). When the CPU samples that pin between instructions, it suspends the current execution context and invokes the registered interrupt handler.

Modern x86 systems use the Advanced PIC (APIC) architecture:

Devices
  β”‚ IRQ lines
  β–Ό
I/O APIC  ──────────────────────────► Local APIC (CPU 0)
(system chipset)  ◄── IPI ──────────► Local APIC (CPU 1)
                                        β”‚
                                        β”œβ”€β”€ Has a built-in timer
                                        └── Receives/sends IPIs

xv6 initializes the Local APIC in lapicinit(), called from main():

lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));

Interrupt Sources

Not all interrupts come from hardware devices. The x86 architecture categorizes them as:

Source Examples
Internal faults (exceptions) Divide-by-zero (#DE), page fault (#PF), general protection fault (#GP)
User software int n instruction (used for system calls)
Hardware Timer tick, keyboard, disk completion

x86 Interrupt Vectors

Every interrupt type is assigned a small integer called a vector (0–255). When an interrupt occurs, the processor indexes into the Interrupt Descriptor Table (IDT) using the vector number to find the handler address.

Intel reserves vectors 0–31 for processor-defined exceptions:

Vector Exception
0 Divide error
13 General protection fault
14 Page fault

Vectors 32–255 are available for OS/device use. xv6 maps them as:

Vector Handler
32 Timer (clock β†’ scheduler)
33 Keyboard
46 IDE disk

xv6 Interrupt Initialization

The full sequence from main():

main()
 β”œβ”€β”€ lapicinit()      β€” enable Local APIC, set spurious-interrupt vector
 β”œβ”€β”€ tvinit()         β€” fill the IDT
 β”‚     └── for n = 0..255:
 β”‚           mkgate(idt, n, vectors[n], 0)
 β”‚           (vectors[] is generated assembly that pushes the vector number
 β”‚            and jumps to the common alltraps handler)
 └── mpmain()
       └── idtinit()  β€” load the IDT register
             └── lidt((void*)idt, PGSIZE)

After idtinit(), the processor knows where to find the handler for every vector. When the timer fires (vector 32), the kernel's clock handler runs and eventually calls the scheduler, allowing another process to runβ€”this is the heartbeat of preemptive multitasking.


Key Takeaways

Practice

  1. What does xv6's kalloc() return when the physical page free list is empty?
  2. In xv6, how does kfree(char *v) return a page to the allocator?
  3. Which of the following scenarios requires physically contiguous pages? (Choose the best answer.)
  4. The Linux kernel uses the buddy system instead of a simple free list. What specific problem does the buddy system solve that xv6's free list cannot?
  5. Explain the difference between polling and interrupts as strategies for detecting when a device has completed an I/O operation. What is the key trade-off?
  6. In the x86 APIC architecture, what is the role of the I/O APIC (as opposed to the Local APIC)?
  7. Which x86 interrupt vector range is reserved by the processor for processor-defined exceptions (such as divide-by-zero and page faults)?
  8. Trace the xv6 interrupt initialization sequence starting from main(). Name each function called, what it does, and which data structure or CPU register it sets up.
  9. In xv6, timer interrupts are delivered as vector 32. What does the timer interrupt handler ultimately trigger?