Process Management in the Linux Kernel
Why this matters
Every system call you write, every kernel module you build, and every driver you debug runs in the context of a process. Understanding how the kernel tracks, schedules, creates, and destroys processes is foundational — it explains why current works, why threads behave the way they do, and what really happens when a program crashes or exits.
What Is a Process?
A process is a program currently executing in the system. The kernel tracks four kinds of per-process state:
| Component | Examples |
|---|---|
| CPU registers | instruction pointer, stack pointer, general-purpose regs |
| Program code | the text section mapped from the ELF binary |
| Memory segments | data, BSS, heap, stack |
| Kernel resources | open file descriptors, pending signals, address space descriptor |
Processes also own one or more threads — concurrent flows of execution within the same address space. From the kernel's perspective (more on this below), threads and processes are the same thing.
From user space, three system calls drive the process lifecycle:
fork()— duplicates the calling process; the child is an almost-exact copyexecv()— replaces the current process image with a new programwait()— blocks the caller until a child changes state (exited, stopped, resumed by signal)
The Process Descriptor: task_struct
Every process is represented by a struct task_struct (defined in include/linux/sched.h). This is one of the largest structs in the kernel — it holds everything about a process.
Allocation and access
task_struct is heap-allocated, not placed on the kernel stack. This is a deliberate security decision: putting it on the stack would risk corruption from kernel stack overflows.
To access the current process efficiently, the kernel uses a per-CPU variable:
/* arch/x86/include/asm/current.h */
DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return this_cpu_read_stable(current_task);
}
#define current get_current()
The current macro is valid only when the kernel runs in process context (e.g., inside a system call). In interrupt context there is no associated user process, so current is meaningless.
Process ID (PID)
Each process gets a pid_t identifier:
- Default maximum: 32,768 (fits in a signed
int) - Configurable up to 4 million via
/proc/sys/kernel/pid_max - PIDs wrap around when the maximum is reached
Process states (task->__state)
| State | Meaning |
|---|---|
TASK_RUNNING |
Runnable — either executing on a CPU or waiting in the scheduler run queue. Can be in user- or kernel-space. |
TASK_INTERRUPTIBLE |
Sleeping, waiting for a condition. Wakes on the condition or on a signal. |
TASK_UNINTERRUPTIBLE |
Sleeping, waiting for a condition. Does not wake on signals — used for I/O that must not be interrupted (e.g., disk reads). |
__TASK_TRACED |
Being traced by another process (e.g., a debugger via ptrace). |
__TASK_STOPPED |
Execution paused by a signal such as SIGSTOP. Neither running nor waiting to run. |
The Process Family Tree
The Linux process tree is rooted at init (PID 1), launched by the kernel as the final boot step. On modern Debian-based systems, init is replaced by systemd. init's task_struct is the global init_task.
Every task_struct carries links for navigating the tree:
current->parent // my parent
current->children // list of my children
current->sibling // my siblings under the same parent
current->tasks // node in the global list of all tasks
Helper macros next_task(t) and for_each_process(t) make it easy to walk the full task list.
Process Creation
Linux has no primitive to create a process from nothing (no spawn or CreateProcess). Instead:
fork()creates a child — a near-copy of the parent (different PID, PPID, and a few resource counters).exec()loads a new binary into the copied address space.
Copy-on-Write (CoW)
Naively copying all parent pages on fork() would be expensive. Linux uses Copy-on-Write:
fork()duplicates only the page tables, marking all pages read-only.- When either process tries to write a page, a page fault fires.
- The kernel copies just that page, remaps it as read-write in the faulting process, and resumes.
Result: fork() is fast (no data is copied upfront) and memory-efficient (read-only pages are shared until a write occurs).
Inside fork() / clone()
fork() is implemented via the clone() system call. The kernel path is:
fork() → clone() → kernel_clone() → copy_process() → wake_up_new_task()
copy_process() does the heavy lifting:
dup_task_struct()— copies the kernel stack,task_struct, andthread_info- Checks process count limits
- Clears fields that must not be inherited (e.g., pending signals)
sched_fork()— sets child state toTASK_NEW- Copies parent's files, signal handlers, address space metadata, etc.
alloc_pid()— assigns a new PID- Returns a pointer to the child
task_struct
wake_up_new_task() then transitions the child to TASK_RUNNING.
Threads
In most operating systems, threads and processes are distinct kernel objects. Linux has no separate thread concept.
A thread is simply another process that shares resources with its creator. The clone() flags control what is shared:
// Creating a POSIX thread
clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);
| Flag | What is shared |
|---|---|
CLONE_VM |
Virtual address space (same page tables) |
CLONE_FS |
Filesystem context (root, cwd, umask) |
CLONE_FILES |
File descriptor table |
CLONE_SIGHAND |
Signal handlers |
Each thread still has its own task_struct and is scheduled independently.
Kernel Threads
Kernel threads perform background kernel work (e.g., kworker for work queues, migration for CPU load balancing). They differ from user processes in one key way: they have no user-space address space (mm in task_struct is NULL).
All kernel threads are children of kthreadd (PID 2). You can list them with:
ps --ppid 2
Kernel thread API
// Create a stopped thread
struct task_struct *t = kthread_create(threadfn, data, "name-%d", i);
// Make it runnable
wake_up_process(t);
// Create and immediately start (combines the two above)
kthread_run(threadfn, data, "name-%d", i);
// Request the thread to stop (sets kthread_should_stop())
kthread_stop(t);
The threadfn should periodically call kthread_should_stop() and return when it is true.
Process Termination
A process terminates when it calls exit() (explicitly, or implicitly when main() returns). The kernel path is:
exit() → sys_exit() → do_exit() (kernel/exit.c)
do_exit() steps:
exit_signals()— sets thePF_EXITINGflag; stores exit code intask_struct.exit_codeexit_mm()— releases themm_struct(address space)exit_sem()— dequeues from any semaphore waitexit_files()/exit_fs()— decrements reference counts on file descriptors and filesystem objects; frees any that reach zeroexit_notify()— sends signals to the parent, reparents children, setsexit_state = EXIT_ZOMBIEdo_task_dead()— sets state toTASK_DEADand callsschedule()— the process never returns
Zombie state and cleanup
After do_exit(), the process is a zombie: its task_struct, kernel stack, and thread_info persist so the parent can call wait() to retrieve the exit code. Once the parent calls wait(), release_task() removes the task from the task list and frees the remaining memory.
Orphan / zombie prevention
If a parent exits before its children, those children must be reparented:
exit_notify()callsforget_original_parent()→find_new_reaper()find_new_reaper()returns another thread in the thread group if one exists; otherwise it returnsinit(PID 1)- All children are reparented to the reaper, which will call
wait()for them
This guarantees no orphaned zombie can accumulate indefinitely.
Key Takeaways
- Every process is described by a
task_struct; thecurrentmacro gives fast per-CPU access to it. - PIDs are capped at 32,768 by default (tunable to ~4 million).
- Process states distinguish between runnable (
TASK_RUNNING), interruptibly sleeping, and uninterruptibly sleeping. fork()uses Copy-on-Write: page tables are copied but pages are shared until written.- Linux threads are ordinary processes sharing resources via
clone()flags — no separate thread struct exists. - Kernel threads (children of
kthreadd, PID 2) havemm = NULLand are managed withkthread_create/kthread_run/kthread_stop. - Process termination proceeds through
do_exit(), leaving a zombie until the parent callswait(); orphaned children are reparented toinit.