Processes

Why This Matters

Your computer runs dozens of programs simultaneously even though it has a limited number of CPU cores. The operating system creates the illusion that each program has its own dedicated CPU and private memory. The abstraction that makes this possible is the process. Understanding how a process is represented and created is foundational to everything else in OS design: scheduling, memory management, file I/O, and security all revolve around the process.


What Is a Process?

A process is a program currently executing in the system. It is the OS's unit of execution and isolation. A process is composed of:

Component Description
CPU registers The current values of the program counter, stack pointer, general-purpose registers, etc.
Text section The compiled program code loaded into memory
Memory segments Data segment (globals), heap (dynamic allocation), stack (local variables, call frames)
Kernel resources Open file descriptors, current working directory, signal handlers, etc.

A process is the OS's virtualization of both the processor and memory: each process thinks it owns the CPU and a large, contiguous address space, even though neither is truly the case.


The xv6 User-Space View: fork, exec, wait

From a user program's perspective, three system calls define the life cycle of a process:

fork()

pid_t fork(void);

Creates a new process by duplicating the calling process. The child starts as an exact copy of the parent. fork() is unusual: it is called once but returns twice — once in the parent (returning the child's PID) and once in the child (returning 0).

exec()

int exec(const char *path, const char *argv[]);

Replaces the current process image with a new one loaded from path. The process keeps its PID but gets a fresh text segment, stack, and heap. It does not return on success.

wait()

pid_t wait(void);

Blocks the parent until one of its children changes state (typically, terminates). This lets the parent collect the child's exit status and prevents zombie processes from accumulating.

Putting It Together: the init Process

"init" process
     │
     ├── fork()
     │        └── child
     │               ├── exec("sh", argv)   ← replaces itself with the shell
     │               └── exit()
     └── wait()

The init.c code from xv6 illustrates this pattern:

pid = fork();
if (pid == 0) {           // Child
    exec("sh", argv);
    printf(1, "init: exec sh failed\n");
    exit();
}
wait();                   // Parent waits for shell to exit

If exec fails, the child falls through to the error print. The parent always calls wait() so it can reap the child and restart the shell if needed.


The Process Descriptor: struct proc

Every process in xv6 is represented by a struct proc defined in proc.h. Think of it as the OS's "dossier" for a running program — everything the kernel needs to know about and manage a process lives here.

struct proc {
    addr_t sz;                  // Size of process memory (bytes)
    pde_t* pgdir;               // Page table
    char *kstack;               // Bottom of kernel stack for this process
    enum procstate state;       // Process state (UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE)
    int pid;                    // Process ID
    struct proc *parent;        // Parent process
    struct trapframe *tf;       // Trap frame for current syscall
    struct context *context;    // Saved registers for context switch
    void *chan;                 // Sleep channel (non-zero means sleeping on chan)
    int killed;                 // Non-zero if process has been killed
    struct file *ofile[NOFILE]; // Open files
    struct inode *cwd;          // Current working directory
    char name[16];              // Process name (for debugging)
};

Key fields to know:

Field Role
pid Unique numeric identifier for this process
pgdir Pointer to the page table — defines the process's virtual address space
kstack Each process has its own kernel stack used during system calls and interrupts
state The scheduling state (RUNNABLE, RUNNING, SLEEPING, ZOMBIE, etc.)
tf Trap frame: saves user-space register state when a trap/syscall is taken
context Saved kernel-mode registers used by the scheduler to switch between processes
ofile Array of open file pointers (file descriptors)
parent Pointer to the creating process, needed for wait()

The Process Table

xv6 uses a fixed-size process table holding at most NPROC (64) entries:

struct {
    struct spinlock lock;
    struct proc proc[NPROC];
} ptable;

This is simple and predictable but limits the system to 64 concurrent processes. A production OS would use a more dynamic structure.


Process Creation: Inside fork()

Step 1 — Allocate a new process slot: allocproc()

fork() delegates the low-level setup to allocproc(), which:

  1. Scans ptable for an UNUSED slot.
  2. Allocates a kernel stack (kalloc()).
  3. Sets up the trap frame pointer at the top of the kernel stack.
  4. Sets context->rip to forkret, so when the scheduler first runs this process it will execute forkret, which returns to user space via syscall_trapret.
static struct proc* allocproc(void) {
    struct proc *p; char *sp;
    // Find an UNUSED slot
    for (p = ptable.proc; p < &ptable.proc[NPROC]; p++)
        if (p->state == UNUSED) goto found;
found:
    p->kstack = kalloc();
    sp = p->kstack + KSTACKSIZE;

    sp -= sizeof *p->tf;
    p->tf = (struct trapframe*)sp;

    sp -= sizeof(addr_t);
    *(addr_t*)sp = (addr_t)syscall_trapret;

    sp -= sizeof *p->context;
    p->context = (struct context*)sp;
    memset(p->context, 0, sizeof *p->context);
    p->context->rip = (addr_t)forkret;
    ...
}

Step 2 — Copy the parent's state

Back in fork():

int fork(void) {
    struct proc *np;
    if ((np = allocproc()) == 0) return -1;

    np->pgdir = copyuvm(proc->pgdir, proc->sz);  // Copy address space
    np->sz    = proc->sz;
    np->parent = proc;
    *np->tf   = *proc->tf;                        // Copy trap frame (registers)
    np->tf->rax = 0;                              // Child sees fork() return 0

    for (i = 0; i < NOFILE; i++)
        if (proc->ofile[i])
            np->ofile[i] = filedup(proc->ofile[i]); // Dup file references
    ...
}

Why Does fork() Return Twice?

This is a classic exam question. The mechanism:

  1. Parent: fork() calls allocproc(), which sets up the child and returns the child's struct proc*. The parent returns the child's PID normally through the call stack.
  2. Child: The child is marked RUNNABLE but has not run yet. Its trap frame is a copy of the parent's, except tf->rax is set to 0. When the scheduler eventually runs the child and syscall_trapret restores registers from the trap frame, the child's return value register holds 0. So from the child's point of view, fork() returned 0.

The key insight: the child never actually executes the body of fork(). It only starts running from syscall_trapret onward, with a trap frame that makes it look like fork() returned 0.


Trapframe vs. Context

Students often confuse these two:

Field When used What it saves
tf (trapframe) Entering/exiting the kernel (syscall, interrupt) User-space register state
context Switching between processes inside the kernel (scheduler) Kernel-space saved registers (callee-saved + rip)

The trapframe is the bridge between user space and kernel space. The context is the bridge between one kernel execution and another.


Key Takeaways

Practice

  1. Which of the following best describes what a process is in an operating system?
  2. In xv6, what value does fork() return in the child process?
  3. When fork() is called in xv6, what happens to the child's trapframe->rax field?
  4. What is the purpose of the trapframe field (tf) in struct proc?
  5. In xv6, what is the maximum number of processes that can exist simultaneously?
  6. Explain why fork() is said to 'return twice'. What is the mechanism in xv6 that makes this possible?
  7. What is the difference between the 'trapframe' and the 'context' fields in struct proc? When is each used?
  8. A student writes a program that calls fork() but never calls wait(). What problem can this cause?
  9. Trace through what happens when xv6's init process calls fork() and the child calls exec("sh", argv). What does each system call do to the process's state?