System Calls and CPU Privilege Rings

Why This Matters

Every piece of software you run depends on the operating system for access to hardware: files, the network, memory, the clock. But if any program could freely call kernel code or poke hardware registers directly, a buggy app could crash the whole system—and a malicious one could own it. The mechanism that prevents this is hardware-enforced privilege separation, and the controlled gateway through it is the system call interface. Understanding both is foundational to all of OS design.

CPU Privilege Rings

Modern x86 processors implement four privilege levels, called rings, numbered 0 (most privileged) through 3 (least privileged). In practice, operating systems use only two:

Ring	Name	Who runs here
0	Kernel mode	OS kernel
3	User mode	Applications, shells

The hardware enforces this distinction in silicon, not in software. No matter how clever an attacker's code, the CPU will not allow ring-3 code to perform ring-0 operations.

The %cs Register and CPL

The x86 tracks the current privilege level in the Current Privilege Level (CPL), stored in the two lowest bits of the %cs (Code Segment) register. When those bits are 00 (binary), the CPU is in ring 0; when they are 11 (binary 3), it is in ring 3.

  15        3   2  1 0
  ┌──────────┬───┬────┐
  │  index   │TI │ RPL│  ← %cs selector
  └──────────┴───┴────┘
                 └── CPL = bits 1:0

Example: %cs = 0x2b

In binary: 0000 0000 0010 1011
Bits 1:0: 11 → CPL = 3 (user mode)
Bits 15:3: 5 → GDT index 5 (the user code segment descriptor)

You can observe this live in GDB while debugging xv6: step over a syscall instruction and watch %cs change from a value ending in 11 (ring 3) to one ending in 00 (ring 0).

What Does Ring 0 Protect?

Ring 0 does not just protect some data—it protects the mechanisms of isolation themselves:

Writes to %cs: Only the CPU's own privilege-transition instructions can change CPL. User code cannot forge a ring-0 %cs.
I/O port access: in/out instructions to hardware ports require ring 0.
Control registers: %cr0 (enables paging), %cr3 (page table base), %eflags (interrupt enable)—all require ring 0 to modify.

If user code could change any of these, isolation would collapse immediately.

System Calls: The Controlled Gateway

A system call is the only sanctioned way for user-space code to enter the kernel. It is a controlled transfer: the CPU switches to ring 0 along a fixed entry point that the kernel itself registered at boot. User code cannot choose an arbitrary kernel address to jump to.

What System Calls Provide

An abstract hardware interface: apps call read() instead of programming a disk controller directly.
A security boundary: every request passes through kernel validation before hardware is touched.
Stability: kernel data structures cannot be corrupted by buggy user programs.

Examples in xv6

Category	System calls
Process management	`fork`, `exit`, `exec`, `getpid`
Memory management	`sbrk`
File system	`open`, `read`, `write`, `close`, `mkdir`
Inter-process comms	`pipe`
Time	`uptime`

The Complete System Call Flow in xv6

Tracing sleep(100) end-to-end reveals all the machinery.

Step 1 — User-Space Wrapper (`usys.S`)

The C library function sleep() calls a tiny assembly stub:

.global sleep
sleep:
    mov $SYS_sleep, %rax   # load syscall number into %rax
    syscall                 # trap into kernel
    ret

The syscall number is the only way the kernel knows which service is being requested. It is passed in %rax because that is the ABI convention xv6 inherits from x86-64 Linux.

Step 2 — Hardware Transition (`syscall` instruction)

The syscall instruction atomically:

Saves the user %rip and %rflags.
Loads the kernel entry address (from the LSTAR MSR, registered at boot).
Sets CPL to 0 (ring 0) by loading KERNEL_CS into %cs.

There is no way for user code to forge or skip this sequence.

Step 3 — Kernel Entry (`trapasm.S → syscall_entry`)

The kernel entry stub in trapasm.S:

Switches to the kernel stack (user and kernel stacks are separate; sharing them would be a security hole).
Saves all registers into a trapframe struct.
Calls syscall(struct trapframe *tf) in syscall.c.

Step 4 — Dispatch (`syscall.c`)

// syscall.c (simplified)
void syscall(struct trapframe *tf) {
    int num = tf->rax;           // syscall number from %rax
    tf->rax = syscalls[num]();   // dispatch to handler, store return value
}

The syscalls[] array maps numbers to handler functions. Calling an out-of-range number returns an error—there is no path to arbitrary kernel code.

Step 5 — Handler (`sysproc.c → sys_sleep`)

int sys_sleep(void) {
    int n;
    if (argint(0, &n) < 0)   // safely fetch argument from user stack
        return -1;
    acquire(&tickslock);
    ticks0 = ticks;
    while (ticks - ticks0 < n) {
        sleep(&ticks, &tickslock);
    }
    release(&tickslock);
    return 0;
}

Note argint(): the kernel never dereferences user pointers directly—it copies arguments through validated helper functions to prevent pointer-based attacks.

Step 6 — Return (`sysretq`)

After the handler returns, trapasm.S runs syscall_trapret:

Restores saved registers from the trapframe.
Executes sysretq, which atomically sets CPL back to 3 and jumps to the saved user %rip.

The user program resumes exactly where it left off, never aware of how much kernel machinery ran on its behalf.

Visual Summary

User space                        Kernel space
──────────────────────────────────────────────────────
sleep(100)           %cs bits: 11 (ring 3)
  │
  └─► usys.S                      ┌──────────────────┐
      mov $SYS_sleep, %rax        │  syscall_entry   │ trapasm.S
      syscall ────────────────►   │  (switch stack,  │
                     %cs → ring 0 │   save regs)     │
                                  │       │           │
                                  │  syscall()        │ syscall.c
                                  │  dispatch → sys_sleep()
                                  │                   │ sysproc.c
                                  │  (sleep logic)    │
                                  │       │           │
                                  │  syscall_trapret  │
      ◄────────────────  sysretq  │  (restore regs)  │
      %cs → ring 3                └──────────────────┘
  ret

Isolation Mechanisms: The Big Picture

System calls are one piece of a layered isolation strategy:

Mechanism	Status in this lecture
CPU privilege rings	✅ Covered here
System call interface	✅ Covered here
Address spaces	Coming soon
Time-slicing	Coming soon

Each layer is necessary; none alone is sufficient.

Key Takeaways

CPL lives in %cs bits 1:0—ring 0 for kernel, ring 3 for user. Hardware enforces this; software cannot forge it.
Ring 0 protects the protectors: control registers, I/O ports, and %cs itself.
System calls are the only legal kernel entry—they enforce a fixed entry point, privilege elevation, and argument validation.
The syscall/sysretq pair atomically switches CPL—user code never executes with ring-0 privileges except through the kernel's own entry code.
xv6's dispatch table (syscalls[]) maps numbers to handlers, making the interface explicit and auditable.
The kernel always copies arguments safely (via argint, argptr) rather than trusting user-supplied pointers.

System Calls and CPU Privilege Rings

Why This Matters

CPU Privilege Rings

The %cs Register and CPL

What Does Ring 0 Protect?

System Calls: The Controlled Gateway

What System Calls Provide

Examples in xv6

The Complete System Call Flow in xv6

Step 1 — User-Space Wrapper (`usys.S`)

Step 2 — Hardware Transition (`syscall` instruction)

Step 3 — Kernel Entry (`trapasm.S → syscall_entry`)

Step 4 — Dispatch (`syscall.c`)

Step 5 — Handler (`sysproc.c → sys_sleep`)

Step 6 — Return (`sysretq`)

Visual Summary

Isolation Mechanisms: The Big Picture

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

System Calls and CPU Privilege Rings

Why This Matters

CPU Privilege Rings

The %cs Register and CPL

What Does Ring 0 Protect?

System Calls: The Controlled Gateway

What System Calls Provide

Examples in xv6

The Complete System Call Flow in xv6

Step 1 — User-Space Wrapper (usys.S)

Step 2 — Hardware Transition (syscall instruction)

Step 3 — Kernel Entry (trapasm.S → syscall_entry)

Step 4 — Dispatch (syscall.c)

Step 5 — Handler (sysproc.c → sys_sleep)

Step 6 — Return (sysretq)

Visual Summary

Isolation Mechanisms: The Big Picture

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

Step 1 — User-Space Wrapper (`usys.S`)

Step 2 — Hardware Transition (`syscall` instruction)

Step 3 — Kernel Entry (`trapasm.S → syscall_entry`)

Step 4 — Dispatch (`syscall.c`)

Step 5 — Handler (`sysproc.c → sys_sleep`)

Step 6 — Return (`sysretq`)