Introduction to Linux Kernel & Developer Tools

Why This Matters

The Linux kernel is one of the largest and most actively developed software projects in historyβ€”over 27 million lines of C code, with roughly 1,600 contributors adding ~7,500 lines every single day. Before you can write a single line of kernel code, you need a working mental model of what you're modifying and how to work in it efficiently. Professional kernel developers do not just open files and start typing. They use a specific, battle-tested toolchainβ€”and mastering it early saves enormous time later.

This module covers the kernel's architecture and development model, then walks through every tool you'll use throughout the course.


What Is the Linux Kernel?

An operating system kernel is the bridge between applications and hardware. The kernel's job is to:

Responsibility Example
Abstract hardware Expose a file descriptor instead of raw disk sectors
Multiplex resources Schedule 1,000 processes across 8 CPU cores
Isolate processes Prevent process A from reading process B's memory
Enable sharing Let two processes open the same file

User-space programs communicate with the kernel through the system call interface:

fd = open("out", 1);      // kernel opens the file, returns a handle
write(fd, "hello\n", 6);  // kernel writes to the underlying storage
pid = fork();             // kernel duplicates the calling process

The CPU enforces this boundary in hardware. On x86, user-space runs at ring 3 and the kernel at ring 0. Only ring-0 code may touch I/O devices, modify page tables, or execute privileged instructions.


Monolithic vs. Micro-Kernel Design

Linux uses a monolithic kernel design: the entire OS (scheduler, file systems, networking, device drivers) runs in kernel space, sharing one address space.

User space:   application A    application B
              ─────────────────────────────── (system call boundary)
Kernel space: scheduler | VFS | net stack | drivers  ← all one binary
Hardware:     CPU, RAM, disk, NIC

Trade-offs:

Monolithic Micro-kernel
Performance Fast (direct function calls between subsystems) Slower (IPC between servers)
Isolation Weak (one bug can crash everything) Strong (servers in user space crash safely)
Complexity Lower interface count More IPC plumbing
Examples Linux, FreeBSD Minix, seL4

The Tanenbaum–Torvalds debate (1992) argued this exact question. In practice, most production kernels (Linux, macOS, Windows) are hybrids, but Linux's core remains monolithic.


Linux History and Release Cycle

Year Milestone
1991 First release by Linus Torvalds
1992 GPL license; first distros
1996 v2.0 – SMP (multiprocessor) support
2003 v2.6 – PAE, many new architectures
2015 v4.0 – live patching
Today Releases ~every 70 days, 13,000 patches/release

Version numbering: (major).(minor).(stable) β€” e.g., 6.1.71

Linux is licensed under GPLv2: any modification to GPL-licensed code must itself be released under the GPL, along with build instructions.


Version Control: git

Git was invented by Linus Torvalds to manage Linux kernel development. It is a distributed VCS: every clone is a full repository with complete history.

Getting the kernel source

git clone https://github.com/torvalds/linux.git   # GitHub mirror
cd linux
git checkout v6.1                                  # pin to a stable tag

Essential daily commands

# History and blame
git log                  # full commit history
git log <file>           # history for one file
git blame <file>         # who changed each line and when

# Local state
git status               # what has changed
git diff                 # exact line-level diff
git add <file>           # stage a file
git commit               # commit staged changes locally

# Sync
git push                 # send commits to remote
git pull                 # fetch and merge from remote

# Tags
git tag                  # list all tags
git checkout v6.1        # check out a tagged version

Use tig for a prettier ncurses log viewer. Useful aliases to add to ~/.gitconfig:

[alias]
    lg = log --graph
    lp = log --graph --pretty=oneline
    st = status
    co = checkout

The Kernel Source Tree

linux/
β”œβ”€β”€ arch/        # Architecture-specific code (x86, arm, …)
β”œβ”€β”€ block/       # Block device layer
β”œβ”€β”€ Documentation/
β”œβ”€β”€ drivers/     # Device drivers (largest directory)
β”œβ”€β”€ fs/          # File systems (ext4, btrfs, proc, …)
β”œβ”€β”€ include/     # Kernel headers
β”œβ”€β”€ init/        # Early boot code (start_kernel lives here)
β”œβ”€β”€ kernel/      # Core kernel: scheduler, signals, timers
β”œβ”€β”€ mm/          # Memory management
β”œβ”€β”€ net/         # Network stack
└── virt/        # Virtualization (KVM)

There are over 630 directories. You need tools to navigate this.


Building the Kernel

Building happens in three distinct phases.

Step 1 β€” Configure

The .config file at the repo root controls ~3,700 compilation flags for x86. Common approaches:

Command What it does
make menuconfig Interactive ncurses menu; requires libncurses, flex, bison
make defconfig Default config for the current architecture
make oldconfig Reuse the running kernel's config; prompts only for new options
make localmodconfig Config based on currently loaded modules (smallest build)

Step 2 β€” Compile

make -j$(nproc)          # build kernel image (bzImage)
make modules -j$(nproc)  # build loadable modules (.ko files)

The -j flag parallelizes across CPU cores. On a 16-core machine, make -j16 cuts build time dramatically. The output kernel image lands at arch/x86/boot/bzImage.

Step 3 β€” Install

sudo make modules_install   # installs .ko files to /lib/modules/
sudo make install           # copies bzImage and updates bootloader
sudo reboot                 # boots into the new kernel
uname -a                    # verify the version
dmesg                       # inspect kernel log

Alternative (package-based): Generate .deb or .rpm packages for safer, reversible installation:

make deb-pkg          # Debian/Ubuntu
sudo dpkg -i linux-image-6.1_amd64.deb linux-headers-6.1_amd64.deb

Exploring the Code

Linux Cross Reference (LXR)

The fastest way to explore the kernel without installing anything. Visit elixir.bootlin.com to:

cscope

A terminal-based C code browser. Build its database:

sudo apt install cscope
cd linux
ARCH=x86 make cscope       # x86-only (faster, smaller DB)
# or
make cscope                # all architectures

Inside cscope you can search for: C identifiers, function definitions, functions calling/called-by a given function, and text strings. Press Ctrl-d to quit.

ctags + vim

sudo apt install exuberant-ctags
cd linux; ARCH=x86 make tags -j2

In vim:


Terminal Multiplexer: tmux

tmux lets you run multiple terminal sessions inside one SSH connection and detach/reattach without losing stateβ€”essential for long kernel builds on a remote machine.

Command Action
tmux Start a new session
Ctrl-b % Split pane vertically
Ctrl-b " Split pane horizontally
Ctrl-b z Zoom/unzoom current pane
Ctrl-b c Create a new window
Ctrl-b d Detach (session keeps running)
tmux a Reattach to existing session

Kernel vs. User Programming

Writing kernel code feels different from application programming. Key differences:

No standard library

The kernel cannot link against libc. It ships its own equivalents:

User space Kernel space
#include <string.h> #include <linux/string.h>
printf("Hello!") printk(KERN_INFO "Hello!")
malloc(64) kmalloc(64, GFP_KERNEL)

GCC extensions

The kernel relies heavily on GCC-specific extensions:

static inline void func() { ... }        // inlined function

asm volatile("rdtsc" : "=a" (l), "=d" (h));  // inline assembly

if (unlikely(error)) { ... }   // tell the CPU this branch is rare
if (likely(success)) { ... }   // tell the CPU this is the hot path

likely()/unlikely() are hints to the compiler's branch predictor. Use them only when you have profiling evidence or strong domain knowledge.

Constrained environment


Linux Kernel Coding Style

The kernel enforces a specific style (see Documentation/process/coding-style.rst):

/*
 * Multi-line comment: always C-style.
 */
struct foo {
        int member1;      /* 1 tab = 8 chars */
        double member2;
};  /* no typedef! */

void my_function(int the_param, char *string,
        int another_long_parameter)
{
        int x = the_param % 42;
        if (!the_param)
                do_stuff();
        switch (x % 3) {
        case 0:
                cool_function();
                break;
        default:
                do_other_stuff();
        }
}

Writing code that matches the surrounding style is not optionalβ€”patches with style violations are rejected during code review.


Key Takeaways

  1. The Linux kernel is huge and fast-moving. 27 million lines, 13,000 patches per releaseβ€”you cannot read it all. You must use tools.
  2. git is foundational. Every kernel patch, every version, every blame trace flows through git. Master log, blame, diff, checkout.
  3. Building the kernel is a three-step process: configure (.config), compile (make -j), install (make install or .deb/.rpm packages).
  4. Use LXR or cscope to navigate. You will constantly need to trace who calls this? and where is this defined?β€”those questions are what cscope was built for.
  5. Kernel programming is not user programming. No libc, no FP, tiny stack, no memory protection, mandatory concurrency awareness. These constraints affect every design decision.
  6. Style is not optional. Patches are reviewed by humans; style violations get you ignored or rejected before anyone reads your logic.

Practice

  1. Git was originally created by Linus Torvalds. What was its primary purpose?
  2. Which make target generates an interactive, menu-driven configuration interface for the Linux kernel?
  3. What is the default kernel stack size per process on x86 Linux?
  4. Which of the following correctly describes the Linux kernel's monolithic design?
  5. A kernel developer writes if (unlikely(error)) { handle_error(); }. Why?
  6. You want to search the Linux kernel source for every function that calls schedule(). Which cscope query should you use?
  7. List the three high-level steps required to build and install a custom Linux kernel from source. For each step, give the key make command(s).
  8. Explain two concrete ways that writing a Linux kernel module differs from writing a user-space C program. For each, state the constraint and its practical implication.
  9. Scenario: You inherit a kernel patch that uses typedef struct { int x; } my_point_t; and // single-line comments, with 4-space indentation. A senior maintainer rejects the patch before reading the logic. Why? Rewrite the struct definition so it conforms to the Linux kernel coding style.
  10. You SSH into a remote build server and start a 45-minute kernel compilation. Your SSH connection drops halfway through. Which tmux command should you have run before starting the build, and what does it do?