Tools, Code Exploration, and Kernel Debugging

Why This Matters

The Linux kernel has 27 million lines of code, touched by over 1,600 developers per release. No one can hold that in their head β€” professionals rely on tools to navigate, version, manage, and debug it. The xv6 kernel is small by comparison, but the same discipline applies: without good tool habits, you will spend more time fighting your environment than understanding the OS. This lecture plants the tool foundation you will use for every assignment in the course.


Version Control with Git

Git is a distributed version control system originally written by Linus Torvalds for Linux kernel development. "Distributed" means every checkout is a full repository with complete history β€” there is no single server you depend on.

Essential daily commands

Category Command What it does
Setup git config --global user.name "..." Set your identity in the commit log
Start git init Create a new empty repo
Start git clone <url> Copy a remote repo locally
Status git status Show changed / untracked files
Status git diff Show exact line-level changes
Stage git add <file> Mark file for next commit
Commit git commit -m "msg" Save staged snapshot locally
Sync git push Upload commits to remote
Sync git pull Download and merge remote commits
History git log Full commit history
History git blame <file> Who changed which line, and when

The commit workflow

Working directory  β†’  git add  β†’  Staging area  β†’  git commit  β†’  Local repo  β†’  git push  β†’  Remote

Changes flow left-to-right. git status and git diff live in the first two zones; git log shows you the local repo.

Git aliases (.gitconfig tricks)

[alias]
    br = branch
    co = checkout
    st = status
    lg = log --graph
    lp = log --graph --pretty=oneline

Aliases save keystrokes and encourage frequent commits.

Submitting homework via GitHub Classroom

For this course, homework is submitted by pushing to a GitHub Classroom repo:

git remote add turn-in https://github.com/sysec-teaching/xv6-s26-<your-github-id>.git
git push turn-in main

Creating and applying patches

When you need to share a specific commit as a standalone patch file:

# Create patch from the most recent commit
git format-patch -1

# Apply someone else's patch
git am 0001-my-change.patch

# Undo an applied patch
git reset --hard HEAD~1

If you have multiple messy commits to clean up before patching, squash them first:

git rebase -i <base_commit>
# In the editor, change 'pick' to 's' (squash) on all but the first commit

Editors

There is no single correct editor β€” vim, emacs, and VS Code all have loyal users among kernel developers. The important thing is to choose one and become fluent in it. Switching editors mid-project costs more time than learning the "wrong" one deeply.

For this course, any editor works. If you want a setup already tuned for kernel work (vim-plug plugins, zsh via oh-my-zsh), a setup script is provided in the course materials.


Exploring Large Codebases with cscope

grep works for small projects. For a kernel with hundreds of .c files, cscope builds a cross-reference database and lets you jump directly to definitions, callers, and callee sites β€” something grep cannot do.

Build the database

cscope        # index current directory
cscope -R     # include all subdirectories (use this for xv6)

Interactive queries

After launching cscope, you can search for:

Press Ctrl-d to exit.

Using cscope inside vim

Place your cursor on a symbol and press Ctrl-] to jump to its definition. Navigate files with :bp (back) and :bn (next). Or issue queries directly:

:cs find g userinit     " jump to global definition of userinit
:cs find s mpinit       " find all uses of the symbol mpinit
:cs find f main.c       " find file main.c

Exercise: Find where userinit() is defined in xv6. Write down the file and line number.


Managing Your Terminal with tmux

When you debug xv6 you need at least two terminal windows open simultaneously (one for QEMU, one for GDB). tmux lets you do this inside a single SSH connection, and crucially, your session survives disconnection.

Key concepts

Must-know commands

Command Effect
tmux Start a new session
tmux ls List all sessions
tmux a Attach to most recent session
Ctrl-b % Split pane vertically
Ctrl-b " Split pane horizontally
Ctrl-b z Zoom current pane (toggle fullscreen)
Ctrl-b ←→↑↓ Move between panes
Ctrl-b d Detach (session stays alive)

The prefix for all tmux commands is Ctrl-b. Press it, release it, then press the command key.


Debugging the xv6 Kernel with GDB

Setup

xv6 runs inside QEMU. GDB connects to QEMU's built-in stub over a local socket:

# Terminal 1 (or tmux pane 1)
make qemu-nox-gdb

# Terminal 2 (or tmux pane 2)
gdb

The file .gdbinit.tmpl (in the xv6 repo root) is a GDB init template that tells GDB which binary to load and which port to connect to. When you run make qemu-nox-gdb, xv6's Makefile generates a .gdbinit from it.

Useful GDB layout commands

Once inside GDB, use TUI mode to see source code or registers alongside the command prompt:

(gdb) layout src    # show C source
(gdb) layout asm    # show disassembly
(gdb) layout regs   # show CPU registers

These views update as you step through code, making it far easier to correlate high-level C with what the CPU is actually doing.


How xv6 Boots

Understanding the boot sequence helps you know where to set breakpoints when debugging early kernel code.

The boot chain

Hardware powers on
    ↓
Firmware (BIOS or UEFI) runs
    ↓
BIOS scans bootable devices (HDD, USB, CD-ROM…)
    ↓
Reads sector 0 (512 bytes) β†’ loads it at physical address 0x7C00
    ↓
CPU jumps to 0x7C00  ← this is the bootloader (bootblock)
    ↓
Bootloader reads kernel ELF from sector 1 β†’ loads it at 0x100000
    ↓
Jumps to kernel entry point
    ↓
Kernel initializes, eventually spawns init and sh

xv6 disk layout

Sector offset Contents
0 (0x000000) Bootloader (bootblock)
1 (0x000200) Kernel (ELF binary)
2+ File system

You can inspect this directly:

hexdump -C xv6.img | less

The first 512 bytes are the bootloader; starting at byte 512 you will see the ELF magic bytes (\x7fELF).

BIOS vs UEFI

xv6 uses the BIOS/MBR boot path.


Programming in xv6

xv6 is a complete OS, but its user space is intentionally minimal β€” there is no glibc. Instead:

Example: adding a sleep utility

  1. Create sleep.c using system calls from user.h (specifically the sleep syscall).
  2. Add _sleep to UPROGS in Makefile.
  3. Run make qemu and test with sleep 5 at the xv6 shell.

Key Takeaways

  1. Git is non-negotiable β€” commit early, commit often; every commit is a save point you can roll back to.
  2. cscope replaces grep for kernel-scale codebases; learn the three most useful queries: find definition, find callers, find callees.
  3. tmux keeps your session alive across disconnects and lets you tile panes for multi-window debugging.
  4. GDB + QEMU is the kernel debugging stack β€” make qemu-nox-gdb in one pane, gdb in another; layout src/asm/regs for visibility.
  5. The boot sequence has three stages: firmware β†’ bootloader at 0x7C00 β†’ kernel at 0x100000.
  6. xv6 user programs use user.h, not standard library headers β€” understand this before you write your first utility.

Practice

  1. What does git add do in the standard git workflow?
  2. Why does xv6 load the bootloader at physical address 0x7C00?
  3. Which cscope query would you use to find every function that calls userinit?
  4. You are working on a remote server over SSH. Your connection drops. Which tmux command will let you resume your previous session when you reconnect?
  5. When debugging xv6 with GDB, which GDB command lets you see the CPU registers update in real time as you step through code?
  6. Explain the three-stage boot sequence of xv6: what runs at each stage, and what physical memory address is significant at each transition?
  7. You have three messy commits on your branch and need to submit them as a single clean patch file. Walk through the exact git commands you would use.
  8. You want to write a new xv6 user-space utility called hello. Which header file should you include instead of <stdio.h>?
  9. A classmate says: 'I'll just use grep -r to search for function definitions across the xv6 source.' Give two concrete reasons why cscope is a better choice for this task.
  10. You run hexdump -C xv6.img | less and look at the first 512 bytes. What will you find starting at byte offset 512 (sector 1)?