Kernel Debugging Techniques

Why This Matters

Kernel development follows a tight loop: write code → build → deploy → test → debug. Unlike userspace programs, you cannot attach a debugger trivially, symbols may be stripped, and a crash halts the entire machine rather than just one process. Even experienced kernel developers identify debugging as the real bottleneck. The earlier you internalize these techniques, the less time you will spend staring at a blank screen after a panic.


1. Print Debug Messages with printk()

printk() is the kernel's equivalent of printf(). It writes a formatted string to the kernel's ring buffer, where it can be retrieved with dmesg or read from /proc/kmsg.

Log Levels

Every printk() call should specify a log level prefix. Levels run from 0 (most urgent) to 7 (most verbose):

Macro Level Meaning
KERN_EMERG 0 System is unusable
KERN_ALERT 1 Action must be taken immediately
KERN_CRIT 2 Critical conditions
KERN_ERR 3 Error conditions
KERN_WARNING 4 Warning conditions
KERN_NOTICE 5 Normal but significant
KERN_INFO 6 Informational
KERN_DEBUG 7 Debug-level messages

If you omit the level, the kernel uses KERN_WARNING or KERN_ERR as the default.

printk(KERN_DEBUG "debug message from %s:%d\n", __func__, __LINE__);

Controlling Which Messages Appear

The kernel only prints messages whose level is higher priority (lower number) than the current console log level. You can inspect and change this:

# Shows: current  default  minimum  boot-time-default
$ cat /proc/sys/kernel/printk
4       4      1       7

# Enable all levels (0–7) during development:
$ echo 7 > /proc/sys/kernel/printk

The Ring Buffer

The kernel message buffer is a fixed-size circular buffer. When it fills up it wraps around, discarding the oldest messages. If you are generating a lot of output, increase the buffer size by adding log_buf_len=1M (must be a power of 2) to the kernel boot parameters.

Special Format Specifiers

printk() supports extra format specifiers beyond standard printf:

/* Print a symbol name + offset from a function pointer */
printk("Calling: %pS\n", p->func);       // "versatile_init+0x0/0x110"
printk("Faulted at %pS\n", (void *)regs->ip);

/* Print a symbol from a stack return address */
printk(" %s%pB\n", reliable ? "" : "? ", (void *)*stack);

These are invaluable for decoding raw addresses in panic output.

Convenience Wrappers

Rather than writing the log level prefix by hand, use the pr_* family:

pr_info("Module loaded, version %d\n", VERSION);
pr_debug("Entering %s\n", __func__);
pr_err("Failed to allocate buffer: %d\n", ret);

For driver code that has a struct device *dev, use dev_info(), dev_err(), etc., which automatically prefix the device name. For /proc files, use seq_printf().


2. Assertions: BUG_ON() and WARN_ON()

These macros are the kernel's equivalent of assert().

BUG_ON(ptr == NULL);   // panics if ptr is NULL
WARN_ON(len > MAX);    // prints backtrace but keeps running
Macro Condition true → Use when
BUG_ON(c) Kernel panic + full call stack The invariant violation is unrecoverable
WARN_ON(c) Call stack printed, execution continues The invariant violation is suspicious but survivable

BUG_ON is a hard stop — use it for situations where continuing would corrupt data or produce nonsensical results. WARN_ON is for "this should not happen, but if it does log it and limp on."


3. Analyzing Kernel Panic Messages

When the kernel panics it prints a message like:

RIP: 0010:lkp_init+0x41/0x1000

lkp_init+0x41 means offset 0x41 bytes into the lkp_init function. To find the corresponding source line you have two methods.

Method 1: objdump

objdump -S lkp.o | less

-S interleaves source with disassembly (requires debug symbols). Scroll to the function and count to offset 0x41 from its start. The source annotation will name the file and line.

Method 2: gdb

gdb lkp.o
(gdb) list *(lkp_init+0x41)

gdb decodes the address directly and prints the surrounding source lines. This is faster once you know the syntax.


4. Interactive Debugging with QEMU and GDB

For stepping through kernel code interactively, the gold standard is running the kernel inside QEMU and attaching GDB over QEMU's built-in GDB stub.

Architecture Overview

+------------------+          :1234
|  GDB (host)      |  <-TCP-> |  QEMU GDB stub  |
|  (your terminal) |          |  (controls VM)   |
+------------------+          +------------------+
                                       |
                              +------------------+
                              |  Linux kernel    |
                              |  (guest VM)      |
                              +------------------+

Because the GDB stub is wired directly into QEMU's emulation logic, GDB has full control: it can halt execution, set breakpoints anywhere (even in boot code), inspect registers, and walk kernel data structures.

Step 1: Build the Kernel with Debug Info

Enable these options in .config (or via make menuconfig → Kernel hacking → Compile-time checks):

CONFIG_DEBUG_INFO=y
CONFIG_GDB_SCRIPTS=y

CONFIG_DEBUG_INFO includes DWARF debug info. CONFIG_GDB_SCRIPTS installs Python helpers under scripts/gdb/ that add Linux-aware lx-* commands to GDB.

Step 2: Launch QEMU with the GDB Stub

sudo qemu-system-x86_64 \
   -s -nographic -smp 2 -m 2G \
   -nic user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:2200-:22 \
   -net nic,model=e1000 \
   -drive file=alpine.qcow2,format=qcow2 \
   -kernel ${BZIMAGE} -append "nokaslr console=ttyS0 root=/dev/sda3"

Key flags:

Step 3: Connect GDB

cd /path/to/linux-build
gdb vmlinux
(gdb) target remote :1234

You now have a live GDB session attached to the running kernel.

Useful GDB Commands for Kernel Debugging

(gdb) b lkp_init          # breakpoint at function
(gdb) hbreak start_kernel # hardware breakpoint (needed for very early boot)
(gdb) d 1                 # delete breakpoint 1
(gdb) c                   # continue
(gdb) bt                  # backtrace
(gdb) n                   # next (step over)
(gdb) s                   # step (step into)
(gdb) p variable          # print variable
(gdb) p *ptr              # print dereferenced pointer
(gdb) info registers      # dump registers

Linux-Provided GDB Helpers (lx-*)

After CONFIG_GDB_SCRIPTS=y is set, load the symbols with:

(gdb) lx-symbols

Then use Linux-specific helpers:

(gdb) lx-dmesg                  # print kernel log buffer of the target
(gdb) p $lx_current().pid       # inspect current task's PID
(gdb) apropos lx                # list all lx-* helpers

You can set breakpoints on kernel module functions before the module is loaded:

(gdb) b btrfs_init_sysfs
# GDB will ask: "Make breakpoint pending on future shared library load? (y or [n]) y"
(gdb) c                         # continue; GDB fires when the module loads

Practical Tips


Key Takeaways

  1. printk() with explicit log levels is your first line of defense. Set /proc/sys/kernel/printk to 7 during development to see all messages.
  2. BUG_ON() halts the kernel on an invariant violation; WARN_ON() logs and continues. Choose based on whether the violation is recoverable.
  3. Panic messages encode the fault address as function+offset. Decode with objdump -S or gdb list *(func+offset).
  4. QEMU + GDB gives you a full interactive debugger attached to a live kernel. The combination of CONFIG_DEBUG_INFO, -s (QEMU stub), and lx-symbols (GDB helper) is the standard kernel debugging setup.
  5. nokaslr is essential for QEMU/GDB debugging — without it, runtime addresses won't match the symbols in vmlinux.

Practice

  1. Which printk() log level has the highest urgency (lowest numeric value)?
  2. What is the key behavioral difference between BUG_ON(c) and WARN_ON(c) when condition c is true?
  3. You run cat /proc/sys/kernel/printk and see 4 4 1 7. What does the first 4 represent?
  4. Name two convenience wrapper families that replace raw printk() calls, and describe when you would prefer one over the other.
  5. A kernel panic message contains the line RIP: 0010:my_driver_init+0x58/0x200. Which command lets you identify the exact source line responsible?
  6. When launching QEMU for kernel debugging, what is the purpose of the -s flag and the nokaslr kernel command-line option?
  7. Explain why the printk() ring buffer wrapping around is a problem during debugging and describe two ways to mitigate it.
  8. After connecting GDB to a running QEMU kernel, you want to set a breakpoint on btrfs_init_sysfs, but the btrfs module is not yet loaded. What GDB commands do you use, and what does GDB require from you to make this work?
  9. You are debugging a kernel module that intermittently corrupts a shared data structure. BUG_ON in the corruption path is never triggered. Which printk() format specifier would be most useful for logging the exact function that last wrote to the structure, given you store the return address of the caller in the struct?
  10. You need to debug a crash that occurs very early in kernel boot — before the init process starts — using QEMU and GDB. Walk through the specific steps and flag(s) needed to pause at the very first kernel instruction so you can set breakpoints before the crash site.