Process Management in the Linux Kernel

Why this matters

Every system call you write, every kernel module you build, and every driver you debug runs in the context of a process. Understanding how the kernel tracks, schedules, creates, and destroys processes is foundational — it explains why current works, why threads behave the way they do, and what really happens when a program crashes or exits.

What Is a Process?

A process is a program currently executing in the system. The kernel tracks four kinds of per-process state:

Component	Examples
CPU registers	instruction pointer, stack pointer, general-purpose regs
Program code	the text section mapped from the ELF binary
Memory segments	data, BSS, heap, stack
Kernel resources	open file descriptors, pending signals, address space descriptor

Processes also own one or more threads — concurrent flows of execution within the same address space. From the kernel's perspective (more on this below), threads and processes are the same thing.

From user space, three system calls drive the process lifecycle:

fork() — duplicates the calling process; the child is an almost-exact copy
execv() — replaces the current process image with a new program
wait() — blocks the caller until a child changes state (exited, stopped, resumed by signal)

The Process Descriptor: `task_struct`

Every process is represented by a struct task_struct (defined in include/linux/sched.h). This is one of the largest structs in the kernel — it holds everything about a process.

Allocation and access

task_struct is heap-allocated, not placed on the kernel stack. This is a deliberate security decision: putting it on the stack would risk corruption from kernel stack overflows.

To access the current process efficiently, the kernel uses a per-CPU variable:

/* arch/x86/include/asm/current.h */
DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)
{
    return this_cpu_read_stable(current_task);
}

#define current get_current()

The current macro is valid only when the kernel runs in process context (e.g., inside a system call). In interrupt context there is no associated user process, so current is meaningless.

Process ID (PID)

Each process gets a pid_t identifier:

Default maximum: 32,768 (fits in a signed int)
Configurable up to 4 million via /proc/sys/kernel/pid_max
PIDs wrap around when the maximum is reached

Process states (`task->__state`)

State	Meaning
`TASK_RUNNING`	Runnable — either executing on a CPU or waiting in the scheduler run queue. Can be in user- or kernel-space.
`TASK_INTERRUPTIBLE`	Sleeping, waiting for a condition. Wakes on the condition or on a signal.
`TASK_UNINTERRUPTIBLE`	Sleeping, waiting for a condition. Does not wake on signals — used for I/O that must not be interrupted (e.g., disk reads).
`__TASK_TRACED`	Being traced by another process (e.g., a debugger via `ptrace`).
`__TASK_STOPPED`	Execution paused by a signal such as `SIGSTOP`. Neither running nor waiting to run.

The Process Family Tree

The Linux process tree is rooted at init (PID 1), launched by the kernel as the final boot step. On modern Debian-based systems, init is replaced by systemd. init's task_struct is the global init_task.

Every task_struct carries links for navigating the tree:

current->parent      // my parent
current->children    // list of my children
current->sibling     // my siblings under the same parent
current->tasks       // node in the global list of all tasks

Helper macros next_task(t) and for_each_process(t) make it easy to walk the full task list.

Process Creation

Linux has no primitive to create a process from nothing (no spawn or CreateProcess). Instead:

fork() creates a child — a near-copy of the parent (different PID, PPID, and a few resource counters).
exec() loads a new binary into the copied address space.

Copy-on-Write (CoW)

Naively copying all parent pages on fork() would be expensive. Linux uses Copy-on-Write:

fork() duplicates only the page tables, marking all pages read-only.
When either process tries to write a page, a page fault fires.
The kernel copies just that page, remaps it as read-write in the faulting process, and resumes.

Result: fork() is fast (no data is copied upfront) and memory-efficient (read-only pages are shared until a write occurs).

Inside `fork()` / `clone()`

fork() is implemented via the clone() system call. The kernel path is:

fork()  →  clone()  →  kernel_clone()  →  copy_process()  →  wake_up_new_task()

copy_process() does the heavy lifting:

dup_task_struct() — copies the kernel stack, task_struct, and thread_info
Checks process count limits
Clears fields that must not be inherited (e.g., pending signals)
sched_fork() — sets child state to TASK_NEW
Copies parent's files, signal handlers, address space metadata, etc.
alloc_pid() — assigns a new PID
Returns a pointer to the child task_struct

wake_up_new_task() then transitions the child to TASK_RUNNING.

Threads

In most operating systems, threads and processes are distinct kernel objects. Linux has no separate thread concept.

A thread is simply another process that shares resources with its creator. The clone() flags control what is shared:

// Creating a POSIX thread
clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);

Flag	What is shared
`CLONE_VM`	Virtual address space (same page tables)
`CLONE_FS`	Filesystem context (root, cwd, umask)
`CLONE_FILES`	File descriptor table
`CLONE_SIGHAND`	Signal handlers

Each thread still has its own task_struct and is scheduled independently.

Kernel Threads

Kernel threads perform background kernel work (e.g., kworker for work queues, migration for CPU load balancing). They differ from user processes in one key way: they have no user-space address space (mm in task_struct is NULL).

All kernel threads are children of kthreadd (PID 2). You can list them with:

ps --ppid 2

Kernel thread API

// Create a stopped thread
struct task_struct *t = kthread_create(threadfn, data, "name-%d", i);

// Make it runnable
wake_up_process(t);

// Create and immediately start (combines the two above)
kthread_run(threadfn, data, "name-%d", i);

// Request the thread to stop (sets kthread_should_stop())
kthread_stop(t);

The threadfn should periodically call kthread_should_stop() and return when it is true.

Process Termination

A process terminates when it calls exit() (explicitly, or implicitly when main() returns). The kernel path is:

exit()  →  sys_exit()  →  do_exit()   (kernel/exit.c)

do_exit() steps:

exit_signals() — sets the PF_EXITING flag; stores exit code in task_struct.exit_code
exit_mm() — releases the mm_struct (address space)
exit_sem() — dequeues from any semaphore wait
exit_files() / exit_fs() — decrements reference counts on file descriptors and filesystem objects; frees any that reach zero
exit_notify() — sends signals to the parent, reparents children, sets exit_state = EXIT_ZOMBIE
do_task_dead() — sets state to TASK_DEAD and calls schedule() — the process never returns

Zombie state and cleanup

After do_exit(), the process is a zombie: its task_struct, kernel stack, and thread_info persist so the parent can call wait() to retrieve the exit code. Once the parent calls wait(), release_task() removes the task from the task list and frees the remaining memory.

Orphan / zombie prevention

If a parent exits before its children, those children must be reparented:

exit_notify() calls forget_original_parent() → find_new_reaper()
find_new_reaper() returns another thread in the thread group if one exists; otherwise it returns init (PID 1)
All children are reparented to the reaper, which will call wait() for them

This guarantees no orphaned zombie can accumulate indefinitely.

Key Takeaways

Every process is described by a task_struct; the current macro gives fast per-CPU access to it.
PIDs are capped at 32,768 by default (tunable to ~4 million).
Process states distinguish between runnable (TASK_RUNNING), interruptibly sleeping, and uninterruptibly sleeping.
fork() uses Copy-on-Write: page tables are copied but pages are shared until written.
Linux threads are ordinary processes sharing resources via clone() flags — no separate thread struct exists.
Kernel threads (children of kthreadd, PID 2) have mm = NULL and are managed with kthread_create/kthread_run/kthread_stop.
Process termination proceeds through do_exit(), leaving a zombie until the parent calls wait(); orphaned children are reparented to init.

Process Management in the Linux Kernel

Why this matters

What Is a Process?

The Process Descriptor: `task_struct`

Allocation and access

Process ID (PID)

Process states (`task->__state`)

The Process Family Tree

Process Creation

Copy-on-Write (CoW)

Inside `fork()` / `clone()`

Threads

Kernel Threads

Kernel thread API

Process Termination

Zombie state and cleanup

Orphan / zombie prevention

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

Process Management in the Linux Kernel

Why this matters

What Is a Process?

The Process Descriptor: task_struct

Allocation and access

Process ID (PID)

Process states (task->__state)

The Process Family Tree

Process Creation

Copy-on-Write (CoW)

Inside fork() / clone()

Threads

Kernel Threads

Kernel thread API

Process Termination

Zombie state and cleanup

Orphan / zombie prevention

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

The Process Descriptor: `task_struct`

Process states (`task->__state`)

Inside `fork()` / `clone()`