System Calls and CPU Privilege Rings

Why This Matters

Every piece of software you run depends on the operating system for access to hardware: files, the network, memory, the clock. But if any program could freely call kernel code or poke hardware registers directly, a buggy app could crash the whole systemβ€”and a malicious one could own it. The mechanism that prevents this is hardware-enforced privilege separation, and the controlled gateway through it is the system call interface. Understanding both is foundational to all of OS design.


CPU Privilege Rings

Modern x86 processors implement four privilege levels, called rings, numbered 0 (most privileged) through 3 (least privileged). In practice, operating systems use only two:

Ring Name Who runs here
0 Kernel mode OS kernel
3 User mode Applications, shells

The hardware enforces this distinction in silicon, not in software. No matter how clever an attacker's code, the CPU will not allow ring-3 code to perform ring-0 operations.

The %cs Register and CPL

The x86 tracks the current privilege level in the Current Privilege Level (CPL), stored in the two lowest bits of the %cs (Code Segment) register. When those bits are 00 (binary), the CPU is in ring 0; when they are 11 (binary 3), it is in ring 3.

  15        3   2  1 0
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”€β”
  β”‚  index   β”‚TI β”‚ RPLβ”‚  ← %cs selector
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”˜
                 └── CPL = bits 1:0

Example: %cs = 0x2b

You can observe this live in GDB while debugging xv6: step over a syscall instruction and watch %cs change from a value ending in 11 (ring 3) to one ending in 00 (ring 0).


What Does Ring 0 Protect?

Ring 0 does not just protect some dataβ€”it protects the mechanisms of isolation themselves:

If user code could change any of these, isolation would collapse immediately.


System Calls: The Controlled Gateway

A system call is the only sanctioned way for user-space code to enter the kernel. It is a controlled transfer: the CPU switches to ring 0 along a fixed entry point that the kernel itself registered at boot. User code cannot choose an arbitrary kernel address to jump to.

What System Calls Provide

Examples in xv6

Category System calls
Process management fork, exit, exec, getpid
Memory management sbrk
File system open, read, write, close, mkdir
Inter-process comms pipe
Time uptime

The Complete System Call Flow in xv6

Tracing sleep(100) end-to-end reveals all the machinery.

Step 1 β€” User-Space Wrapper (usys.S)

The C library function sleep() calls a tiny assembly stub:

.global sleep
sleep:
    mov $SYS_sleep, %rax   # load syscall number into %rax
    syscall                 # trap into kernel
    ret

The syscall number is the only way the kernel knows which service is being requested. It is passed in %rax because that is the ABI convention xv6 inherits from x86-64 Linux.

Step 2 β€” Hardware Transition (syscall instruction)

The syscall instruction atomically:

  1. Saves the user %rip and %rflags.
  2. Loads the kernel entry address (from the LSTAR MSR, registered at boot).
  3. Sets CPL to 0 (ring 0) by loading KERNEL_CS into %cs.

There is no way for user code to forge or skip this sequence.

Step 3 β€” Kernel Entry (trapasm.S β†’ syscall_entry)

The kernel entry stub in trapasm.S:

Step 4 β€” Dispatch (syscall.c)

// syscall.c (simplified)
void syscall(struct trapframe *tf) {
    int num = tf->rax;           // syscall number from %rax
    tf->rax = syscalls[num]();   // dispatch to handler, store return value
}

The syscalls[] array maps numbers to handler functions. Calling an out-of-range number returns an errorβ€”there is no path to arbitrary kernel code.

Step 5 β€” Handler (sysproc.c β†’ sys_sleep)

int sys_sleep(void) {
    int n;
    if (argint(0, &n) < 0)   // safely fetch argument from user stack
        return -1;
    acquire(&tickslock);
    ticks0 = ticks;
    while (ticks - ticks0 < n) {
        sleep(&ticks, &tickslock);
    }
    release(&tickslock);
    return 0;
}

Note argint(): the kernel never dereferences user pointers directlyβ€”it copies arguments through validated helper functions to prevent pointer-based attacks.

Step 6 β€” Return (sysretq)

After the handler returns, trapasm.S runs syscall_trapret:

The user program resumes exactly where it left off, never aware of how much kernel machinery ran on its behalf.

Visual Summary

User space                        Kernel space
──────────────────────────────────────────────────────
sleep(100)           %cs bits: 11 (ring 3)
  β”‚
  └─► usys.S                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      mov $SYS_sleep, %rax        β”‚  syscall_entry   β”‚ trapasm.S
      syscall ────────────────►   β”‚  (switch stack,  β”‚
                     %cs β†’ ring 0 β”‚   save regs)     β”‚
                                  β”‚       β”‚           β”‚
                                  β”‚  syscall()        β”‚ syscall.c
                                  β”‚  dispatch β†’ sys_sleep()
                                  β”‚                   β”‚ sysproc.c
                                  β”‚  (sleep logic)    β”‚
                                  β”‚       β”‚           β”‚
                                  β”‚  syscall_trapret  β”‚
      ◄────────────────  sysretq  β”‚  (restore regs)  β”‚
      %cs β†’ ring 3                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  ret

Isolation Mechanisms: The Big Picture

System calls are one piece of a layered isolation strategy:

Mechanism Status in this lecture
CPU privilege rings βœ… Covered here
System call interface βœ… Covered here
Address spaces Coming soon
Time-slicing Coming soon

Each layer is necessary; none alone is sufficient.


Key Takeaways

  1. CPL lives in %cs bits 1:0β€”ring 0 for kernel, ring 3 for user. Hardware enforces this; software cannot forge it.
  2. Ring 0 protects the protectors: control registers, I/O ports, and %cs itself.
  3. System calls are the only legal kernel entryβ€”they enforce a fixed entry point, privilege elevation, and argument validation.
  4. The syscall/sysretq pair atomically switches CPLβ€”user code never executes with ring-0 privileges except through the kernel's own entry code.
  5. xv6's dispatch table (syscalls[]) maps numbers to handlers, making the interface explicit and auditable.
  6. The kernel always copies arguments safely (via argint, argptr) rather than trusting user-supplied pointers.

Practice

  1. Which bits of the %cs register hold the Current Privilege Level (CPL) on x86-64?
  2. At a certain point during execution you observe that %cs = 0x2b. What is the Current Privilege Level (CPL), and which GDT index is being used?
  3. Which of the following is the primary purpose of system calls in an operating system?
  4. In xv6, where is the system call number stored when a user program invokes a system call?
  5. Which of the following operations is NOT protected by ring 0 on x86?
  6. Describe, step by step, what happens in both hardware and software when a user program executes the syscall instruction in xv6 (64-bit). Start from the moment the instruction executes and end when the kernel handler begins running.
  7. Explain why the xv6 kernel uses helper functions like argint() to fetch system-call arguments instead of directly dereferencing the user-supplied pointer or reading from the user stack.
  8. In xv6's syscall.c, the dispatcher does tf->rax = syscalls[num](). What is the purpose of assigning the return value back to tf->rax?
  9. A student claims: 'Since sysretq sets CPL back to 3, a user program could call sysretq itself to skip the privilege drop and stay in ring 0.' Is this claim correct?
  10. xv6 identifies four high-level isolation mechanisms: (1) CPU privilege rings, (2) the system-call interface, (3) address spaces, and (4) time-slicing. Briefly explain why all four are necessary β€” why doesn't ring-0 protection alone suffice to achieve full process isolation?