Introduction to Linux Kernel & Developer Tools
Why This Matters
The Linux kernel is one of the largest and most actively developed software projects in historyβover 27 million lines of C code, with roughly 1,600 contributors adding ~7,500 lines every single day. Before you can write a single line of kernel code, you need a working mental model of what you're modifying and how to work in it efficiently. Professional kernel developers do not just open files and start typing. They use a specific, battle-tested toolchainβand mastering it early saves enormous time later.
This module covers the kernel's architecture and development model, then walks through every tool you'll use throughout the course.
What Is the Linux Kernel?
An operating system kernel is the bridge between applications and hardware. The kernel's job is to:
| Responsibility | Example |
|---|---|
| Abstract hardware | Expose a file descriptor instead of raw disk sectors |
| Multiplex resources | Schedule 1,000 processes across 8 CPU cores |
| Isolate processes | Prevent process A from reading process B's memory |
| Enable sharing | Let two processes open the same file |
User-space programs communicate with the kernel through the system call interface:
fd = open("out", 1); // kernel opens the file, returns a handle
write(fd, "hello\n", 6); // kernel writes to the underlying storage
pid = fork(); // kernel duplicates the calling process
The CPU enforces this boundary in hardware. On x86, user-space runs at ring 3 and the kernel at ring 0. Only ring-0 code may touch I/O devices, modify page tables, or execute privileged instructions.
Monolithic vs. Micro-Kernel Design
Linux uses a monolithic kernel design: the entire OS (scheduler, file systems, networking, device drivers) runs in kernel space, sharing one address space.
User space: application A application B
βββββββββββββββββββββββββββββββ (system call boundary)
Kernel space: scheduler | VFS | net stack | drivers β all one binary
Hardware: CPU, RAM, disk, NIC
Trade-offs:
| Monolithic | Micro-kernel | |
|---|---|---|
| Performance | Fast (direct function calls between subsystems) | Slower (IPC between servers) |
| Isolation | Weak (one bug can crash everything) | Strong (servers in user space crash safely) |
| Complexity | Lower interface count | More IPC plumbing |
| Examples | Linux, FreeBSD | Minix, seL4 |
The TanenbaumβTorvalds debate (1992) argued this exact question. In practice, most production kernels (Linux, macOS, Windows) are hybrids, but Linux's core remains monolithic.
Linux History and Release Cycle
| Year | Milestone |
|---|---|
| 1991 | First release by Linus Torvalds |
| 1992 | GPL license; first distros |
| 1996 | v2.0 β SMP (multiprocessor) support |
| 2003 | v2.6 β PAE, many new architectures |
| 2015 | v4.0 β live patching |
| Today | Releases ~every 70 days, 13,000 patches/release |
Version numbering: (major).(minor).(stable) β e.g., 6.1.71
- Mainline β Linus's tree, contains all new features
- Stable β bug fixes backported after mainline release
- LTS β a stable release maintained for several years (e.g., 6.1, 5.15)
- RC β release candidates for testing before mainline
Linux is licensed under GPLv2: any modification to GPL-licensed code must itself be released under the GPL, along with build instructions.
Version Control: git
Git was invented by Linus Torvalds to manage Linux kernel development. It is a distributed VCS: every clone is a full repository with complete history.
Getting the kernel source
git clone https://github.com/torvalds/linux.git # GitHub mirror
cd linux
git checkout v6.1 # pin to a stable tag
Essential daily commands
# History and blame
git log # full commit history
git log <file> # history for one file
git blame <file> # who changed each line and when
# Local state
git status # what has changed
git diff # exact line-level diff
git add <file> # stage a file
git commit # commit staged changes locally
# Sync
git push # send commits to remote
git pull # fetch and merge from remote
# Tags
git tag # list all tags
git checkout v6.1 # check out a tagged version
Use tig for a prettier ncurses log viewer. Useful aliases to add to ~/.gitconfig:
[alias]
lg = log --graph
lp = log --graph --pretty=oneline
st = status
co = checkout
The Kernel Source Tree
linux/
βββ arch/ # Architecture-specific code (x86, arm, β¦)
βββ block/ # Block device layer
βββ Documentation/
βββ drivers/ # Device drivers (largest directory)
βββ fs/ # File systems (ext4, btrfs, proc, β¦)
βββ include/ # Kernel headers
βββ init/ # Early boot code (start_kernel lives here)
βββ kernel/ # Core kernel: scheduler, signals, timers
βββ mm/ # Memory management
βββ net/ # Network stack
βββ virt/ # Virtualization (KVM)
There are over 630 directories. You need tools to navigate this.
Building the Kernel
Building happens in three distinct phases.
Step 1 β Configure
The .config file at the repo root controls ~3,700 compilation flags for x86. Common approaches:
| Command | What it does |
|---|---|
make menuconfig |
Interactive ncurses menu; requires libncurses, flex, bison |
make defconfig |
Default config for the current architecture |
make oldconfig |
Reuse the running kernel's config; prompts only for new options |
make localmodconfig |
Config based on currently loaded modules (smallest build) |
Step 2 β Compile
make -j$(nproc) # build kernel image (bzImage)
make modules -j$(nproc) # build loadable modules (.ko files)
The -j flag parallelizes across CPU cores. On a 16-core machine, make -j16 cuts build time dramatically. The output kernel image lands at arch/x86/boot/bzImage.
Step 3 β Install
sudo make modules_install # installs .ko files to /lib/modules/
sudo make install # copies bzImage and updates bootloader
sudo reboot # boots into the new kernel
uname -a # verify the version
dmesg # inspect kernel log
Alternative (package-based): Generate .deb or .rpm packages for safer, reversible installation:
make deb-pkg # Debian/Ubuntu
sudo dpkg -i linux-image-6.1_amd64.deb linux-headers-6.1_amd64.deb
Exploring the Code
Linux Cross Reference (LXR)
The fastest way to explore the kernel without installing anything. Visit elixir.bootlin.com to:
- Browse any kernel version's source
- Search for any identifier (function, variable, struct)
- Follow cross-references to see every place a symbol is defined or used
cscope
A terminal-based C code browser. Build its database:
sudo apt install cscope
cd linux
ARCH=x86 make cscope # x86-only (faster, smaller DB)
# or
make cscope # all architectures
Inside cscope you can search for: C identifiers, function definitions, functions calling/called-by a given function, and text strings. Press Ctrl-d to quit.
ctags + vim
sudo apt install exuberant-ctags
cd linux; ARCH=x86 make tags -j2
In vim:
:tag start_kernelβ jump to the definition ofstart_kernelCtrl-]β follow the tag under the cursorCtrl-tβ jump back:bp/:bnβ navigate between open files
Terminal Multiplexer: tmux
tmux lets you run multiple terminal sessions inside one SSH connection and detach/reattach without losing stateβessential for long kernel builds on a remote machine.
| Command | Action |
|---|---|
tmux |
Start a new session |
Ctrl-b % |
Split pane vertically |
Ctrl-b " |
Split pane horizontally |
Ctrl-b z |
Zoom/unzoom current pane |
Ctrl-b c |
Create a new window |
Ctrl-b d |
Detach (session keeps running) |
tmux a |
Reattach to existing session |
Kernel vs. User Programming
Writing kernel code feels different from application programming. Key differences:
No standard library
The kernel cannot link against libc. It ships its own equivalents:
| User space | Kernel space |
|---|---|
#include <string.h> |
#include <linux/string.h> |
printf("Hello!") |
printk(KERN_INFO "Hello!") |
malloc(64) |
kmalloc(64, GFP_KERNEL) |
GCC extensions
The kernel relies heavily on GCC-specific extensions:
static inline void func() { ... } // inlined function
asm volatile("rdtsc" : "=a" (l), "=d" (h)); // inline assembly
if (unlikely(error)) { ... } // tell the CPU this branch is rare
if (likely(success)) { ... } // tell the CPU this is the hot path
likely()/unlikely() are hints to the compiler's branch predictor. Use them only when you have profiling evidence or strong domain knowledge.
Constrained environment
- No floating-point β the FPU context belongs to user processes
- Tiny stack β 8 KB (2 pages) on x86; deep recursion will corrupt memory silently
- No memory protection β a bad pointer doesn't segfault; it triggers a kernel oops (often leading to a full kernel panic)
- Concurrency everywhere β kernel code can run on multiple CPUs simultaneously, be preempted at any time, and interrupted by hardware interrupt handlers. You must reason carefully about every shared data structure.
Linux Kernel Coding Style
The kernel enforces a specific style (see Documentation/process/coding-style.rst):
- Indentation: 1 tab = 8 characters (not spaces, not 4-space tabs)
- Naming:
snake_caseonly β neverCamelCase(spin_lock, notSpinLock) - Comments: C-style only β
/* like this */, never// like this - Line length: 80 columns max
- No typedef for structs by default
/*
* Multi-line comment: always C-style.
*/
struct foo {
int member1; /* 1 tab = 8 chars */
double member2;
}; /* no typedef! */
void my_function(int the_param, char *string,
int another_long_parameter)
{
int x = the_param % 42;
if (!the_param)
do_stuff();
switch (x % 3) {
case 0:
cool_function();
break;
default:
do_other_stuff();
}
}
Writing code that matches the surrounding style is not optionalβpatches with style violations are rejected during code review.
Key Takeaways
- The Linux kernel is huge and fast-moving. 27 million lines, 13,000 patches per releaseβyou cannot read it all. You must use tools.
- git is foundational. Every kernel patch, every version, every blame trace flows through git. Master
log,blame,diff,checkout. - Building the kernel is a three-step process: configure (
.config), compile (make -j), install (make installor.deb/.rpmpackages). - Use LXR or cscope to navigate. You will constantly need to trace
who calls this?andwhere is this defined?βthose questions are what cscope was built for. - Kernel programming is not user programming. No libc, no FP, tiny stack, no memory protection, mandatory concurrency awareness. These constraints affect every design decision.
- Style is not optional. Patches are reviewed by humans; style violations get you ignored or rejected before anyone reads your logic.