System Calls and CPU Privilege Rings
Why This Matters
Every piece of software you run depends on the operating system for access to hardware: files, the network, memory, the clock. But if any program could freely call kernel code or poke hardware registers directly, a buggy app could crash the whole systemβand a malicious one could own it. The mechanism that prevents this is hardware-enforced privilege separation, and the controlled gateway through it is the system call interface. Understanding both is foundational to all of OS design.
CPU Privilege Rings
Modern x86 processors implement four privilege levels, called rings, numbered 0 (most privileged) through 3 (least privileged). In practice, operating systems use only two:
| Ring | Name | Who runs here |
|---|---|---|
| 0 | Kernel mode | OS kernel |
| 3 | User mode | Applications, shells |
The hardware enforces this distinction in silicon, not in software. No matter how clever an attacker's code, the CPU will not allow ring-3 code to perform ring-0 operations.
The %cs Register and CPL
The x86 tracks the current privilege level in the Current Privilege Level (CPL), stored in the two lowest bits of the %cs (Code Segment) register. When those bits are 00 (binary), the CPU is in ring 0; when they are 11 (binary 3), it is in ring 3.
15 3 2 1 0
ββββββββββββ¬ββββ¬βββββ
β index βTI β RPLβ β %cs selector
ββββββββββββ΄ββββ΄βββββ
βββ CPL = bits 1:0
Example: %cs = 0x2b
- In binary:
0000 0000 0010 1011 - Bits 1:0:
11β CPL = 3 (user mode) - Bits 15:3:
5β GDT index 5 (the user code segment descriptor)
You can observe this live in GDB while debugging xv6: step over a syscall instruction and watch %cs change from a value ending in 11 (ring 3) to one ending in 00 (ring 0).
What Does Ring 0 Protect?
Ring 0 does not just protect some dataβit protects the mechanisms of isolation themselves:
- Writes to
%cs: Only the CPU's own privilege-transition instructions can change CPL. User code cannot forge a ring-0%cs. - I/O port access:
in/outinstructions to hardware ports require ring 0. - Control registers:
%cr0(enables paging),%cr3(page table base),%eflags(interrupt enable)βall require ring 0 to modify.
If user code could change any of these, isolation would collapse immediately.
System Calls: The Controlled Gateway
A system call is the only sanctioned way for user-space code to enter the kernel. It is a controlled transfer: the CPU switches to ring 0 along a fixed entry point that the kernel itself registered at boot. User code cannot choose an arbitrary kernel address to jump to.
What System Calls Provide
- An abstract hardware interface: apps call
read()instead of programming a disk controller directly. - A security boundary: every request passes through kernel validation before hardware is touched.
- Stability: kernel data structures cannot be corrupted by buggy user programs.
Examples in xv6
| Category | System calls |
|---|---|
| Process management | fork, exit, exec, getpid |
| Memory management | sbrk |
| File system | open, read, write, close, mkdir |
| Inter-process comms | pipe |
| Time | uptime |
The Complete System Call Flow in xv6
Tracing sleep(100) end-to-end reveals all the machinery.
Step 1 β User-Space Wrapper (usys.S)
The C library function sleep() calls a tiny assembly stub:
.global sleep
sleep:
mov $SYS_sleep, %rax # load syscall number into %rax
syscall # trap into kernel
ret
The syscall number is the only way the kernel knows which service is being requested. It is passed in %rax because that is the ABI convention xv6 inherits from x86-64 Linux.
Step 2 β Hardware Transition (syscall instruction)
The syscall instruction atomically:
- Saves the user
%ripand%rflags. - Loads the kernel entry address (from the
LSTARMSR, registered at boot). - Sets CPL to 0 (ring 0) by loading
KERNEL_CSinto%cs.
There is no way for user code to forge or skip this sequence.
Step 3 β Kernel Entry (trapasm.S β syscall_entry)
The kernel entry stub in trapasm.S:
- Switches to the kernel stack (user and kernel stacks are separate; sharing them would be a security hole).
- Saves all registers into a
trapframestruct. - Calls
syscall(struct trapframe *tf)insyscall.c.
Step 4 β Dispatch (syscall.c)
// syscall.c (simplified)
void syscall(struct trapframe *tf) {
int num = tf->rax; // syscall number from %rax
tf->rax = syscalls[num](); // dispatch to handler, store return value
}
The syscalls[] array maps numbers to handler functions. Calling an out-of-range number returns an errorβthere is no path to arbitrary kernel code.
Step 5 β Handler (sysproc.c β sys_sleep)
int sys_sleep(void) {
int n;
if (argint(0, &n) < 0) // safely fetch argument from user stack
return -1;
acquire(&tickslock);
ticks0 = ticks;
while (ticks - ticks0 < n) {
sleep(&ticks, &tickslock);
}
release(&tickslock);
return 0;
}
Note argint(): the kernel never dereferences user pointers directlyβit copies arguments through validated helper functions to prevent pointer-based attacks.
Step 6 β Return (sysretq)
After the handler returns, trapasm.S runs syscall_trapret:
- Restores saved registers from the trapframe.
- Executes
sysretq, which atomically sets CPL back to 3 and jumps to the saved user%rip.
The user program resumes exactly where it left off, never aware of how much kernel machinery ran on its behalf.
Visual Summary
User space Kernel space
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
sleep(100) %cs bits: 11 (ring 3)
β
βββΊ usys.S ββββββββββββββββββββ
mov $SYS_sleep, %rax β syscall_entry β trapasm.S
syscall βββββββββββββββββΊ β (switch stack, β
%cs β ring 0 β save regs) β
β β β
β syscall() β syscall.c
β dispatch β sys_sleep()
β β sysproc.c
β (sleep logic) β
β β β
β syscall_trapret β
βββββββββββββββββ sysretq β (restore regs) β
%cs β ring 3 ββββββββββββββββββββ
ret
Isolation Mechanisms: The Big Picture
System calls are one piece of a layered isolation strategy:
| Mechanism | Status in this lecture |
|---|---|
| CPU privilege rings | β Covered here |
| System call interface | β Covered here |
| Address spaces | Coming soon |
| Time-slicing | Coming soon |
Each layer is necessary; none alone is sufficient.
Key Takeaways
- CPL lives in
%csbits 1:0βring 0 for kernel, ring 3 for user. Hardware enforces this; software cannot forge it. - Ring 0 protects the protectors: control registers, I/O ports, and
%csitself. - System calls are the only legal kernel entryβthey enforce a fixed entry point, privilege elevation, and argument validation.
- The
syscall/sysretqpair atomically switches CPLβuser code never executes with ring-0 privileges except through the kernel's own entry code. - xv6's dispatch table (
syscalls[]) maps numbers to handlers, making the interface explicit and auditable. - The kernel always copies arguments safely (via
argint,argptr) rather than trusting user-supplied pointers.