Kernel Tracing, eBPF, and Interrupt Handling

Why This Matters

Production systems break in ways that cannot be reproduced in a lab. Tracing tools let you observe the live kernel without reboots or recompilation. Meanwhile, every device you interact with — keyboard, NIC, disk — relies on the interrupt mechanism to talk to the CPU efficiently. Understanding both topics unlocks the ability to diagnose real system behavior and to write correct, high-performance kernel code.

Part 1: Kernel Tracing

What is Kernel Tracing?

Kernel tracing is dynamic instrumentation: attaching probes to running kernel functions or instructions without modifying or recompiling the kernel. Primary use cases:

Debugging and performance analysis
Identifying bottlenecks and security threats
Observability in production systems

Kprobes and Kretprobes

Probe type	Where it fires	Typical use
kprobe	Before any kernel instruction	Inspect arguments, count calls
kretprobe	When the probed function returns	Capture return value, measure latency

Kprobes work via x86's int3 one-byte breakpoint instruction. When the kernel registers a kprobe, it replaces the target instruction with int3. When that instruction executes, the CPU raises a debug exception, the kprobe handler runs, and then normal execution resumes.

Writing a kprobe Kernel Module

A kprobe module fills in a struct kprobe and registers it:

static struct kprobe kp;

static int handler_pre(struct kprobe *p, struct pt_regs *regs) {
    char filename[256];
    if (regs->si && copy_from_user(filename,
            (char __user *)regs->si, sizeof(filename)) == 0) {
        printk(KERN_INFO "[kprobe] File opened: %s\n", filename);
    }
    return 0;
}

static int __init kprobe_init(void) {
    kp.symbol_name = "do_sys_openat2";
    kp.pre_handler  = handler_pre;
    return register_kprobe(&kp);
}

Key points:

symbol_name targets any exported or non-exported kernel symbol.
pre_handler receives a struct pt_regs *, giving access to all CPU registers at the probe site — the arguments to do_sys_openat2 live in rdi, rsi, rdx, etc.
Always call unregister_kprobe() in your module's exit function.

Profiling with `perf` and kprobes

You can attach kprobes without writing a module using perf probe:

sudo perf probe -a do_sys_openat2          # register the probe
sudo cat /sys/kernel/debug/kprobes/list    # verify it's active
sudo perf record -e probe:do_sys_openat2 -aR sleep 1
sudo perf report                           # analyze call sites
sudo perf probe -d do_sys_openat2          # remove the probe

Part 2: eBPF

What is eBPF?

eBPF (Extended Berkeley Packet Filter) lets you run sandboxed programs inside the kernel in response to events — without writing a kernel module. Programs are verified by an in-kernel verifier before execution, preventing crashes or infinite loops. eBPF is now the foundation for many Linux tracing, networking, and security tools.

Key properties:

Safe: The verifier rejects programs that could corrupt kernel memory or loop forever.
Efficient: Programs are JIT-compiled to native machine code.
Widely applicable: kprobes, tracepoints, network hooks, perf events, and more.

bpftrace

bpftrace is a high-level eBPF frontend with a language similar to AWK. It compiles scripts to BPF bytecode and loads them automatically.

# List all syscall tracepoints
sudo bpftrace -l 'tracepoint:syscalls:*'

# Print every execve call with the process name and binary path
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve {
    printf("Process %s executed %s\n", comm, str(args->filename)); }'

# Trace malloc calls in libc with size
sudo bpftrace -e 'uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc {
    printf("PID %d (%s) called malloc(size=%llu)\n", pid, comm, arg0); }'

bpftrace supports:

tracepoint: — stable kernel tracepoints
kprobe: / kretprobe: — dynamic function entry/return probes
uprobe: — user-space function probes

Part 3: Interrupts

Why Interrupts Exist

Devices are slow compared to CPUs. A spinning hard disk takes ~10 ms for a read (4–10 ms seek + ~5.5 ms rotational latency at 5400 RPM). Two strategies exist for the CPU to learn when I/O finishes:

Strategy	Description	Problem
Polling	CPU repeatedly checks the device status register	Burns CPU cycles while waiting
Interrupt	Device signals the CPU on completion	CPU is free until the signal arrives

Interrupts let the CPU run other processes while I/O is in flight, making them essential for multiprogramming.

Interrupt Controller Hardware

Devices send electrical signals to the CPU through an interrupt controller:

I/O APIC (Advanced Programmable Interrupt Controller): sits on the system chipset, receives interrupts from devices, and routes them to the appropriate processor.
Local APIC: one per CPU core. Handles the routed interrupt and can also generate a timer interrupt (the heartbeat that drives scheduling) and inter-processor interrupts (IPI) for SMP coordination.

Interrupt Request Lines (IRQ)

Each device is identified by an IRQ number. Classic 8259A examples:

IRQ	Device
0	System timer
1	Keyboard controller
3, 4	Serial ports
5	Parallel port / sound card

Modern PCIe devices share IRQ lines using MSI (Message Signaled Interrupts), making sharing the norm.

Exceptions: Software Interrupts

Exceptions are interrupts raised by the CPU itself during instruction execution:

Faults: recoverable — page fault (CPU retries the instruction), general protection fault
Traps: non-recoverable, no retry — breakpoint (int3), overflow
Aborts: unrecoverable — machine check

The int N instruction triggers a software interrupt for vector N (0–255). int 0x80 is the classic Linux 32-bit syscall mechanism. iret returns from any interrupt or exception.

Maskable vs. Non-Maskable Interrupts

Type	Can be disabled?	Examples
Non-Maskable (NMI)	No — always handled	Power failure, uncorrectable memory error
Maskable	Yes — cleared by `cli` / set by `sti`	All normal device interrupts

The EFLAGS.IF flag controls masking. Clearing it with cli disables all maskable interrupts on the local CPU.

Interrupt Descriptor Table (IDT)

The IDT is the hardware dispatch table for interrupts and exceptions:

256 entries, each 16 bytes on x86-64
Each entry (a gate descriptor) contains:
- Offset: 64-bit destination instruction pointer (split across the entry)
- Segment selector: the kernel code segment (CS) to load
- Present flag: marks the entry as valid
The IDTR register holds the base address and size of the IDT.
The lidt instruction loads the IDTR.

Predefined Vectors

Vector	Meaning
0	Divide Error
1	Debug Exception
2	NMI
3	Breakpoint (`int3`)
13	General Protection Fault
14	Page Fault
32–255	User-defined (device interrupts)

What Happens When `int N` Executes

CPU looks up vector N in the IDT (base address from IDTR + N × 16).
Checks CPL ≤ DPL in the gate descriptor (privilege check).
Saves current SS:ESP to a CPU-internal register.
Loads the new kernel SS:ESP from the TSS (Task State Segment) — switches to the kernel stack.
Pushes user SS, ESP, EFLAGS, CS, EIP onto the kernel stack.
Clears certain EFLAGS bits (e.g., IF for interrupt gates).
Jumps to the handler address from the IDT descriptor.
Handler returns with iret, which pops all saved state and resumes user mode.

Interrupt Service Routines (ISR)

An ISR is a normal C function matching the irq_handler_t prototype:

typedef irqreturn_t (*irq_handler_t)(int irq, void *dev_id);
// Return IRQ_HANDLED if this device generated the interrupt,
// IRQ_NONE otherwise (important for shared lines)

int request_irq(unsigned int irq, irq_handler_t handler,
                unsigned long flags, const char *devname, void *dev_id);
void free_irq(unsigned int irq, void *dev_id);

For shared IRQ lines (IRQF_SHARED flag), dev_id must be unique per handler so the kernel can identify who to call and who to remove.

Interrupt Context Constraints

ISRs run in interrupt context (also called atomic context), not process context. This has important consequences:

Forbidden in ISR	Reason	Alternative
`kmalloc(…, GFP_KERNEL)`	May sleep waiting for memory	Use `GFP_ATOMIC`
`mutex_lock()`	May sleep	Use `spinlock`
`printk()` in hot paths	Too slow / unsafe on some paths	Use `trace_printk()`
Sleeping / blocking	ISR is not a schedulable entity	Defer work to bottom half

Stack size is limited to one page (4 KB) per interrupt.

Top Half vs. Bottom Half

Because ISRs must be fast but device work can be substantial, Linux splits interrupt handling:

Hardware interrupt arrives
       │
       ▼
  ┌─────────────────────────────────────┐
  │  TOP HALF (ISR — runs immediately)  │
  │  • Acknowledge hardware             │
  │  • Copy data to kernel memory       │
  │  • Re-arm the device                │
  └─────────────────┬───────────────────┘
                    │ schedules
                    ▼
  ┌─────────────────────────────────────┐
  │  BOTTOM HALF (deferred)             │
  │  • Softirq / Tasklet / Work Queue   │
  │  • Runs with interrupts enabled     │
  │  • Does the heavy processing        │
  └─────────────────────────────────────┘

Network example: The top half copies the packet from the NIC into main memory (urgent — NIC buffer is small). The bottom half parses protocol headers, routes the packet, and hands it to a socket (can wait).

Interrupt Control in the Kernel

Kernel code sometimes needs to run atomically with respect to interrupts:

local_irq_disable();   // cli on this CPU
/* critical section */
local_irq_enable();    // sti on this CPU

Warning: local_irq_disable() is not reference-counted. Calling it twice and then local_irq_enable() once re-enables interrupts immediately — a bug. Use local_irq_save(flags) / local_irq_restore(flags) when nesting is possible.

Disabling local interrupts does not protect against other CPU cores. Pair with spinlocks for SMP safety.

To disable a specific IRQ line (e.g., while reinitializing a device):

disable_irq(irq);      // waits for any running handler to finish
disable_irq_nosync(irq);  // returns immediately
enable_irq(irq);

Interrupt Handling Flow in Linux

Each interrupt vector has a specific entry point in the kernel that:

Saves the interrupt vector number and all registers.
Calls common_interrupt(struct pt_regs *regs, u32 vector).
common_interrupt acknowledges the interrupt and calls architecture-specific dispatch logic to invoke the registered ISR.

You can inspect interrupt counts and which handlers are registered at /proc/interrupts.

Key Takeaways

kprobes attach dynamically to any kernel instruction using the int3 breakpoint trap; kretprobes fire on function return.
eBPF programs run sandboxed in the kernel, verified before execution, and JIT-compiled for efficiency; bpftrace is the easiest entry point.
Devices are slow relative to the CPU; interrupts let the CPU work on other tasks until a device signals completion (vs. wasteful polling).
The IDT is a 256-entry hardware table mapping interrupt vectors to handler addresses; IDTR holds its base address.
Exceptions (divide-by-zero, page fault, int3) are CPU-generated interrupts, handled identically to hardware interrupts.
ISRs run in interrupt context: no sleeping, no blocking locks, limited stack, GFP_ATOMIC for allocations.
Linux splits interrupt processing into a fast top half (ISR) and a deferred bottom half (softirq/tasklet/work queue) to balance latency against throughput.
local_irq_disable() is not reference-counted — save/restore flags when nesting; it does not protect against other cores.

Kernel Tracing, eBPF, and Interrupt Handling

Why This Matters

Part 1: Kernel Tracing

What is Kernel Tracing?

Kprobes and Kretprobes

Writing a kprobe Kernel Module

Profiling with `perf` and kprobes

Part 2: eBPF

What is eBPF?

bpftrace

Part 3: Interrupts

Why Interrupts Exist

Interrupt Controller Hardware

Interrupt Request Lines (IRQ)

Exceptions: Software Interrupts

Maskable vs. Non-Maskable Interrupts

Interrupt Descriptor Table (IDT)

Predefined Vectors

What Happens When `int N` Executes

Interrupt Service Routines (ISR)

Interrupt Context Constraints

Top Half vs. Bottom Half

Interrupt Control in the Kernel

Interrupt Handling Flow in Linux

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

Kernel Tracing, eBPF, and Interrupt Handling

Why This Matters

Part 1: Kernel Tracing

What is Kernel Tracing?

Kprobes and Kretprobes

Writing a kprobe Kernel Module

Profiling with perf and kprobes

Part 2: eBPF

What is eBPF?

bpftrace

Part 3: Interrupts

Why Interrupts Exist

Interrupt Controller Hardware

Interrupt Request Lines (IRQ)

Exceptions: Software Interrupts

Maskable vs. Non-Maskable Interrupts

Interrupt Descriptor Table (IDT)

Predefined Vectors

What Happens When int N Executes

Interrupt Service Routines (ISR)

Interrupt Context Constraints

Top Half vs. Bottom Half

Interrupt Control in the Kernel

Interrupt Handling Flow in Linux

Key Takeaways

Practice

Model answer

Model answer

Model answer

Results

Profiling with `perf` and kprobes

What Happens When `int N` Executes