Physical Page Management and Interrupts

Why This Matters

Every byte of memory your process uses has to come from somewhere. The kernel is the custodian of physical RAM: it decides which physical pages are free, hands them out on demand, and reclaims them when no longer needed. At the same time, a modern computer is full of slow devices—disk drives, network cards, keyboards—that take orders of magnitude longer than the CPU to complete an operation. Interrupts are the mechanism that lets the kernel stay productive while those devices work, instead of wasting cycles in a busy-wait loop. Understanding both topics is essential for reasoning about OS performance and correctness.

Part 1: Physical Page Allocation in xv6

The Free List

xv6 tracks free physical pages with a singly-linked free list. Each free page doubles as a list node: the first few bytes of the page store a pointer to the next free page. This works because a free page has no other content to preserve.

Physical RAM (above kernel, up to PHYSTOP)
┌──────────┬──────────┬──────────┬──────────┐
│ [next]──►│ [next]──►│ [next]──►│  (NULL)  │
│  page A  │  page B  │  page C  │  page D  │
└──────────┴──────────┴──────────┴──────────┘
    ▲
kmem.freelist

Initialization — kinit1 calls freerange, which iterates from the end of the kernel image up to PHYSTOP in steps of PGSIZE (4 KB), calling kfree on each page to insert it into the list:

kinit1(end, P2V(PHYSTOP));

freerange(void *vstart, void *vend) {
    for (; p + PGSIZE <= (char*)vend; p += PGSIZE)
        kfree(p);
}

Allocation (kalloc) — pop from the head of the list, O(1):

char* kalloc(void) {
    r = kmem.freelist;
    if (r)
        kmem.freelist = r->next;
    return (char*)r;   // NULL if no pages left
}

Deallocation (kfree) — push back onto the head of the list, also O(1):

void kfree(char *v) {
    r = (struct run*)v;
    r->next = kmem.freelist;
    kmem.freelist = r;
}

Limitation: Non-Contiguous Allocation

The free list allocates one page at a time. Returned pages are not guaranteed to be physically adjacent. This is perfectly fine for most kernel and user allocations, but some subsystems require a physically contiguous region of multiple pages:

Use case	Why contiguous pages are needed
Huge pages (2 MB, 1 GB)	A single huge-page TLB entry maps a large, contiguous physical range
DMA devices	Many older DMA controllers can only address a contiguous physical buffer; the CPU cannot remap their view of memory

The Buddy System (Linux)

Linux addresses the contiguous-allocation problem with the buddy system. Physical memory is organized into blocks of size 2^k pages. When a block of order k is needed:

If a free block of order k exists, return it.
Otherwise, split a free block of order k+1 into two "buddies" of order k; return one, keep the other free.

When a block is freed, the kernel checks whether its buddy is also free. If so, the two are merged back into a block of order k+1 (and this merging recurses upward). This keeps fragmentation low and makes it possible to satisfy large contiguous requests efficiently.

Part 2: Interrupts

The Problem: Devices Are Slow

A CPU core can execute billions of instructions per second, but a mechanical hard disk takes about 10 milliseconds to seek to data and rotate the right sector under the read head. Ten milliseconds is roughly 10 million CPU cycles—an eternity. The kernel cannot afford to spin-wait while a device works.

There are two strategies for knowing when a device has finished:

Strategy	Description	Drawback
Polling	Kernel repeatedly reads a status register	Wastes CPU cycles; latency proportional to poll interval
Interrupt	Device signals the CPU when done	CPU is free to do other work; overhead only at completion

Modern systems use interrupts for slow devices (disk, network) and sometimes polling for very fast devices (high-speed NICs) where interrupt overhead dominates.

How Interrupts Reach the CPU

Interrupts are electrical signals. A device raises its interrupt line, which is routed through the Programmable Interrupt Controller (PIC). The PIC multiplexes signals from many devices onto a single CPU pin (INTR). When the CPU samples that pin between instructions, it suspends the current execution context and invokes the registered interrupt handler.

Modern x86 systems use the Advanced PIC (APIC) architecture:

Devices
  │ IRQ lines
  ▼
I/O APIC  ──────────────────────────► Local APIC (CPU 0)
(system chipset)  ◄── IPI ──────────► Local APIC (CPU 1)
                                        │
                                        ├── Has a built-in timer
                                        └── Receives/sends IPIs

I/O APIC sits in the system chipset (south bridge). It collects IRQ lines from devices and routes them to the appropriate Local APIC.
Local APIC is inside each processor core. It receives interrupts from the I/O APIC, has its own programmable timer that generates timer interrupts, and can send/receive Inter-Processor Interrupts (IPIs) to other cores.

xv6 initializes the Local APIC in lapicinit(), called from main():

lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));

Interrupt Sources

Not all interrupts come from hardware devices. The x86 architecture categorizes them as:

Source	Examples
Internal faults (exceptions)	Divide-by-zero (#DE), page fault (#PF), general protection fault (#GP)
User software	`int n` instruction (used for system calls)
Hardware	Timer tick, keyboard, disk completion

x86 Interrupt Vectors

Every interrupt type is assigned a small integer called a vector (0–255). When an interrupt occurs, the processor indexes into the Interrupt Descriptor Table (IDT) using the vector number to find the handler address.

Intel reserves vectors 0–31 for processor-defined exceptions:

Vector	Exception
0	Divide error
13	General protection fault
14	Page fault

Vectors 32–255 are available for OS/device use. xv6 maps them as:

Vector	Handler
32	Timer (clock → scheduler)
33	Keyboard
46	IDE disk

xv6 Interrupt Initialization

The full sequence from main():

main()
 ├── lapicinit()      — enable Local APIC, set spurious-interrupt vector
 ├── tvinit()         — fill the IDT
 │     └── for n = 0..255:
 │           mkgate(idt, n, vectors[n], 0)
 │           (vectors[] is generated assembly that pushes the vector number
 │            and jumps to the common alltraps handler)
 └── mpmain()
       └── idtinit()  — load the IDT register
             └── lidt((void*)idt, PGSIZE)

After idtinit(), the processor knows where to find the handler for every vector. When the timer fires (vector 32), the kernel's clock handler runs and eventually calls the scheduler, allowing another process to run—this is the heartbeat of preemptive multitasking.

Key Takeaways

xv6 uses a singly-linked free list of 4 KB pages; kalloc and kfree are O(1) stack operations.
The free list can only hand out individual, potentially non-contiguous pages. For huge pages or DMA, physically contiguous blocks are required; Linux solves this with the buddy system.
Devices are orders of magnitude slower than the CPU. Interrupts let the CPU do other work and respond only when a device signals completion.
On x86, the I/O APIC routes device IRQs to each core's Local APIC; the Local APIC also provides the per-core timer.
Each interrupt type has a vector number (0–255). The processor uses this to index the IDT and find the right handler.
xv6 sets up the IDT in tvinit() and loads it with lidt() inside idtinit().