Processes
Why This Matters
Your computer runs dozens of programs simultaneously even though it has a limited number of CPU cores. The operating system creates the illusion that each program has its own dedicated CPU and private memory. The abstraction that makes this possible is the process. Understanding how a process is represented and created is foundational to everything else in OS design: scheduling, memory management, file I/O, and security all revolve around the process.
What Is a Process?
A process is a program currently executing in the system. It is the OS's unit of execution and isolation. A process is composed of:
| Component | Description |
|---|---|
| CPU registers | The current values of the program counter, stack pointer, general-purpose registers, etc. |
| Text section | The compiled program code loaded into memory |
| Memory segments | Data segment (globals), heap (dynamic allocation), stack (local variables, call frames) |
| Kernel resources | Open file descriptors, current working directory, signal handlers, etc. |
A process is the OS's virtualization of both the processor and memory: each process thinks it owns the CPU and a large, contiguous address space, even though neither is truly the case.
The xv6 User-Space View: fork, exec, wait
From a user program's perspective, three system calls define the life cycle of a process:
fork()
pid_t fork(void);
Creates a new process by duplicating the calling process. The child starts as an exact copy of the parent. fork() is unusual: it is called once but returns twice — once in the parent (returning the child's PID) and once in the child (returning 0).
exec()
int exec(const char *path, const char *argv[]);
Replaces the current process image with a new one loaded from path. The process keeps its PID but gets a fresh text segment, stack, and heap. It does not return on success.
wait()
pid_t wait(void);
Blocks the parent until one of its children changes state (typically, terminates). This lets the parent collect the child's exit status and prevents zombie processes from accumulating.
Putting It Together: the init Process
"init" process
│
├── fork()
│ └── child
│ ├── exec("sh", argv) ← replaces itself with the shell
│ └── exit()
└── wait()
The init.c code from xv6 illustrates this pattern:
pid = fork();
if (pid == 0) { // Child
exec("sh", argv);
printf(1, "init: exec sh failed\n");
exit();
}
wait(); // Parent waits for shell to exit
If exec fails, the child falls through to the error print. The parent always calls wait() so it can reap the child and restart the shell if needed.
The Process Descriptor: struct proc
Every process in xv6 is represented by a struct proc defined in proc.h. Think of it as the OS's "dossier" for a running program — everything the kernel needs to know about and manage a process lives here.
struct proc {
addr_t sz; // Size of process memory (bytes)
pde_t* pgdir; // Page table
char *kstack; // Bottom of kernel stack for this process
enum procstate state; // Process state (UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE)
int pid; // Process ID
struct proc *parent; // Parent process
struct trapframe *tf; // Trap frame for current syscall
struct context *context; // Saved registers for context switch
void *chan; // Sleep channel (non-zero means sleeping on chan)
int killed; // Non-zero if process has been killed
struct file *ofile[NOFILE]; // Open files
struct inode *cwd; // Current working directory
char name[16]; // Process name (for debugging)
};
Key fields to know:
| Field | Role |
|---|---|
pid |
Unique numeric identifier for this process |
pgdir |
Pointer to the page table — defines the process's virtual address space |
kstack |
Each process has its own kernel stack used during system calls and interrupts |
state |
The scheduling state (RUNNABLE, RUNNING, SLEEPING, ZOMBIE, etc.) |
tf |
Trap frame: saves user-space register state when a trap/syscall is taken |
context |
Saved kernel-mode registers used by the scheduler to switch between processes |
ofile |
Array of open file pointers (file descriptors) |
parent |
Pointer to the creating process, needed for wait() |
The Process Table
xv6 uses a fixed-size process table holding at most NPROC (64) entries:
struct {
struct spinlock lock;
struct proc proc[NPROC];
} ptable;
This is simple and predictable but limits the system to 64 concurrent processes. A production OS would use a more dynamic structure.
Process Creation: Inside fork()
Step 1 — Allocate a new process slot: allocproc()
fork() delegates the low-level setup to allocproc(), which:
- Scans
ptablefor anUNUSEDslot. - Allocates a kernel stack (
kalloc()). - Sets up the trap frame pointer at the top of the kernel stack.
- Sets
context->riptoforkret, so when the scheduler first runs this process it will executeforkret, which returns to user space viasyscall_trapret.
static struct proc* allocproc(void) {
struct proc *p; char *sp;
// Find an UNUSED slot
for (p = ptable.proc; p < &ptable.proc[NPROC]; p++)
if (p->state == UNUSED) goto found;
found:
p->kstack = kalloc();
sp = p->kstack + KSTACKSIZE;
sp -= sizeof *p->tf;
p->tf = (struct trapframe*)sp;
sp -= sizeof(addr_t);
*(addr_t*)sp = (addr_t)syscall_trapret;
sp -= sizeof *p->context;
p->context = (struct context*)sp;
memset(p->context, 0, sizeof *p->context);
p->context->rip = (addr_t)forkret;
...
}
Step 2 — Copy the parent's state
Back in fork():
int fork(void) {
struct proc *np;
if ((np = allocproc()) == 0) return -1;
np->pgdir = copyuvm(proc->pgdir, proc->sz); // Copy address space
np->sz = proc->sz;
np->parent = proc;
*np->tf = *proc->tf; // Copy trap frame (registers)
np->tf->rax = 0; // Child sees fork() return 0
for (i = 0; i < NOFILE; i++)
if (proc->ofile[i])
np->ofile[i] = filedup(proc->ofile[i]); // Dup file references
...
}
Why Does fork() Return Twice?
This is a classic exam question. The mechanism:
- Parent:
fork()callsallocproc(), which sets up the child and returns the child'sstruct proc*. The parent returns the child's PID normally through the call stack. - Child: The child is marked
RUNNABLEbut has not run yet. Its trap frame is a copy of the parent's, excepttf->raxis set to0. When the scheduler eventually runs the child andsyscall_trapretrestores registers from the trap frame, the child's return value register holds0. So from the child's point of view,fork()returned0.
The key insight: the child never actually executes the body of fork(). It only starts running from syscall_trapret onward, with a trap frame that makes it look like fork() returned 0.
Trapframe vs. Context
Students often confuse these two:
| Field | When used | What it saves |
|---|---|---|
tf (trapframe) |
Entering/exiting the kernel (syscall, interrupt) | User-space register state |
context |
Switching between processes inside the kernel (scheduler) | Kernel-space saved registers (callee-saved + rip) |
The trapframe is the bridge between user space and kernel space. The context is the bridge between one kernel execution and another.
Key Takeaways
- A process = program code + memory state + CPU register state + kernel resources.
- xv6 represents each process with
struct proc; up to 64 live inptable. fork()creates a child by duplicating the parent's address space and trap frame, then setstf->rax = 0so the child sees a return value of 0.fork()"returns twice" because the parent returns normally while the child resumes execution viasyscall_trapretwith a pre-fabricated trap frame.- The trapframe saves user-space registers on a kernel-boundary crossing; the context saves kernel registers during a scheduler switch — they serve different purposes.
exec()replaces the process image;wait()reaps a terminated child.