Tools, Code Exploration, and Kernel Debugging
Why This Matters
The Linux kernel has 27 million lines of code, touched by over 1,600 developers per release. No one can hold that in their head β professionals rely on tools to navigate, version, manage, and debug it. The xv6 kernel is small by comparison, but the same discipline applies: without good tool habits, you will spend more time fighting your environment than understanding the OS. This lecture plants the tool foundation you will use for every assignment in the course.
Version Control with Git
Git is a distributed version control system originally written by Linus Torvalds for Linux kernel development. "Distributed" means every checkout is a full repository with complete history β there is no single server you depend on.
Essential daily commands
| Category | Command | What it does |
|---|---|---|
| Setup | git config --global user.name "..." |
Set your identity in the commit log |
| Start | git init |
Create a new empty repo |
| Start | git clone <url> |
Copy a remote repo locally |
| Status | git status |
Show changed / untracked files |
| Status | git diff |
Show exact line-level changes |
| Stage | git add <file> |
Mark file for next commit |
| Commit | git commit -m "msg" |
Save staged snapshot locally |
| Sync | git push |
Upload commits to remote |
| Sync | git pull |
Download and merge remote commits |
| History | git log |
Full commit history |
| History | git blame <file> |
Who changed which line, and when |
The commit workflow
Working directory β git add β Staging area β git commit β Local repo β git push β Remote
Changes flow left-to-right. git status and git diff live in the first two zones; git log shows you the local repo.
Git aliases (.gitconfig tricks)
[alias]
br = branch
co = checkout
st = status
lg = log --graph
lp = log --graph --pretty=oneline
Aliases save keystrokes and encourage frequent commits.
Submitting homework via GitHub Classroom
For this course, homework is submitted by pushing to a GitHub Classroom repo:
git remote add turn-in https://github.com/sysec-teaching/xv6-s26-<your-github-id>.git
git push turn-in main
Creating and applying patches
When you need to share a specific commit as a standalone patch file:
# Create patch from the most recent commit
git format-patch -1
# Apply someone else's patch
git am 0001-my-change.patch
# Undo an applied patch
git reset --hard HEAD~1
If you have multiple messy commits to clean up before patching, squash them first:
git rebase -i <base_commit>
# In the editor, change 'pick' to 's' (squash) on all but the first commit
Editors
There is no single correct editor β vim, emacs, and VS Code all have loyal users among kernel developers. The important thing is to choose one and become fluent in it. Switching editors mid-project costs more time than learning the "wrong" one deeply.
For this course, any editor works. If you want a setup already tuned for kernel work (vim-plug plugins, zsh via oh-my-zsh), a setup script is provided in the course materials.
Exploring Large Codebases with cscope
grep works for small projects. For a kernel with hundreds of .c files, cscope builds a cross-reference database and lets you jump directly to definitions, callers, and callee sites β something grep cannot do.
Build the database
cscope # index current directory
cscope -R # include all subdirectories (use this for xv6)
Interactive queries
After launching cscope, you can search for:
- C identifiers β any symbol name
- Global definitions β where a function or variable is defined
- Functions called by
fβ whatfcalls - Functions calling
fβ who callsf - Text strings β arbitrary literal text
Press Ctrl-d to exit.
Using cscope inside vim
Place your cursor on a symbol and press Ctrl-] to jump to its definition. Navigate files with :bp (back) and :bn (next). Or issue queries directly:
:cs find g userinit " jump to global definition of userinit
:cs find s mpinit " find all uses of the symbol mpinit
:cs find f main.c " find file main.c
Exercise: Find where userinit() is defined in xv6. Write down the file and line number.
Managing Your Terminal with tmux
When you debug xv6 you need at least two terminal windows open simultaneously (one for QEMU, one for GDB). tmux lets you do this inside a single SSH connection, and crucially, your session survives disconnection.
Key concepts
- Session β a collection of windows, persists even when you detach
- Window β a full-screen tab inside a session
- Pane β a split region inside a window
Must-know commands
| Command | Effect |
|---|---|
tmux |
Start a new session |
tmux ls |
List all sessions |
tmux a |
Attach to most recent session |
Ctrl-b % |
Split pane vertically |
Ctrl-b " |
Split pane horizontally |
Ctrl-b z |
Zoom current pane (toggle fullscreen) |
Ctrl-b ββββ |
Move between panes |
Ctrl-b d |
Detach (session stays alive) |
The prefix for all tmux commands is Ctrl-b. Press it, release it, then press the command key.
Debugging the xv6 Kernel with GDB
Setup
xv6 runs inside QEMU. GDB connects to QEMU's built-in stub over a local socket:
# Terminal 1 (or tmux pane 1)
make qemu-nox-gdb
# Terminal 2 (or tmux pane 2)
gdb
The file .gdbinit.tmpl (in the xv6 repo root) is a GDB init template that tells GDB which binary to load and which port to connect to. When you run make qemu-nox-gdb, xv6's Makefile generates a .gdbinit from it.
Useful GDB layout commands
Once inside GDB, use TUI mode to see source code or registers alongside the command prompt:
(gdb) layout src # show C source
(gdb) layout asm # show disassembly
(gdb) layout regs # show CPU registers
These views update as you step through code, making it far easier to correlate high-level C with what the CPU is actually doing.
How xv6 Boots
Understanding the boot sequence helps you know where to set breakpoints when debugging early kernel code.
The boot chain
Hardware powers on
β
Firmware (BIOS or UEFI) runs
β
BIOS scans bootable devices (HDD, USB, CD-ROMβ¦)
β
Reads sector 0 (512 bytes) β loads it at physical address 0x7C00
β
CPU jumps to 0x7C00 β this is the bootloader (bootblock)
β
Bootloader reads kernel ELF from sector 1 β loads it at 0x100000
β
Jumps to kernel entry point
β
Kernel initializes, eventually spawns init and sh
xv6 disk layout
| Sector offset | Contents |
|---|---|
| 0 (0x000000) | Bootloader (bootblock) |
| 1 (0x000200) | Kernel (ELF binary) |
| 2+ | File system |
You can inspect this directly:
hexdump -C xv6.img | less
The first 512 bytes are the bootloader; starting at byte 512 you will see the ELF magic bytes (\x7fELF).
BIOS vs UEFI
- BIOS (Basic Input/Output System) β legacy firmware; uses the 512-byte Master Boot Record (MBR) convention; addresses in real mode (16-bit).
- UEFI (Unified Extensible Firmware Interface) β modern replacement; supports larger disks, Secure Boot, and 64-bit mode from the start.
xv6 uses the BIOS/MBR boot path.
Programming in xv6
xv6 is a complete OS, but its user space is intentionally minimal β there is no glibc. Instead:
user.hdeclares the syscall wrappers and libc-like helpers (e.g.,printf,strcpy,malloc).- You include
user.hinstead of<stdio.h>or<stdlib.h>. - The build system in
Makefilelists user programs inUPROGS; to add a new utility, add it there and write a.cfile.
Example: adding a sleep utility
- Create
sleep.cusing system calls fromuser.h(specifically thesleepsyscall). - Add
_sleeptoUPROGSinMakefile. - Run
make qemuand test withsleep 5at the xv6 shell.
Key Takeaways
- Git is non-negotiable β commit early, commit often; every commit is a save point you can roll back to.
- cscope replaces grep for kernel-scale codebases; learn the three most useful queries: find definition, find callers, find callees.
- tmux keeps your session alive across disconnects and lets you tile panes for multi-window debugging.
- GDB + QEMU is the kernel debugging stack β
make qemu-nox-gdbin one pane,gdbin another;layout src/asm/regsfor visibility. - The boot sequence has three stages: firmware β bootloader at 0x7C00 β kernel at 0x100000.
- xv6 user programs use
user.h, not standard library headers β understand this before you write your first utility.