Memory Allocation, Advanced Data Structures, and Kernel Modules
Why This Matters
Kernel code runs without the safety net of a user-space allocator or runtime library. You cannot call malloc; you choose between two fundamentally different allocators depending on whether the hardware demands physically contiguous memory. On top of that, the kernel maintains its own high-performance data structures — radix trees, XArrays, and bitmaps — that power the page cache, IRQ tracking, and CPU hot-plug. Finally, recompiling and rebooting the entire kernel every time you change a driver is impractical, so the kernel module system lets you load and unload code while the system stays running. All three topics come together whenever you write a real driver or file-system component.
1. Memory Allocation in the Kernel
kmalloc vs vmalloc
The kernel provides two primary allocation APIs:
kmalloc(size, gfp_mask) |
vmalloc(size) |
|
|---|---|---|
| Physical layout | Physically and virtually contiguous | Virtually contiguous only |
| Max size | ~4 MB on x86 (arch-dependent) | Limited only by free RAM |
| Allocation unit | Byte-granular (slab allocator) | Page (4 KB) |
| Can sleep? | Depends on gfp_mask |
Always may sleep |
| Use case | DMA, memory-mapped I/O, small objects | Large kernel buffers not needing DMA |
| Header | <linux/slab.h> |
<linux/vmalloc.h> (also <linux/slab.h> works) |
| Free | kfree(ptr) |
vfree(ptr) |
Physical contiguity is essential for hardware that accesses memory by physical address, such as DMA controllers and memory-mapped I/O regions. If your code doesn't have that constraint, vmalloc is fine and avoids the size limit.
The gfp_mask
The second argument to kmalloc is a Get Free Pages mask that controls:
- Which page types can be used (e.g., high memory, DMA-able zones).
- Whether the allocator may sleep (wait for memory to be reclaimed).
Two flags you must know:
GFP_KERNEL— the caller is in a sleepable context (process context). The allocator may block to reclaim memory. Use this by default.GFP_ATOMIC— the caller cannot sleep (interrupt handler, spinlock held). Higher chance of failure; never blocks.
#include <linux/slab.h>
char *buf = (char *)kmalloc(128, GFP_KERNEL); /* may sleep */
my_struct *s = (my_struct *)kmalloc(sizeof(*s), GFP_ATOMIC); /* no sleep */
kfree(buf);
kfree(s);
2. Advanced Kernel Data Structures
2.1 Radix Tree (Trie)
A radix tree (or compact prefix tree) maps long-integer keys to pointer values. All descendants of a node share a key prefix, and values live only at leaves.
Linux's radix tree is tuned for page-cache lookups:
- Each internal node has 64 slots, indexed by a 6-bit chunk of the key.
- Leaf slots point to data; non-leaf slots point to child nodes.
- Nodes carry tag arrays (bits 0–2) for status flags like
PAGECACHE_TAG_DIRTY.
Since kernel 4.19, radix_tree_root is an alias for xarray (the successor API), but the radix tree API still works in older kernels.
Core API:
/* Declare or initialize */
RADIX_TREE(my_tree, GFP_KERNEL);
/* or at runtime: */
struct radix_tree_root my_tree;
INIT_RADIX_TREE(my_tree, GFP_KERNEL);
/* Insert / delete / lookup */
int radix_tree_insert(struct radix_tree_root *root, unsigned long index, void *item);
void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
/* Tag management */
void *radix_tree_tag_set (struct radix_tree_root *root, unsigned long index, unsigned int tag);
void *radix_tree_tag_clear(struct radix_tree_root *root, unsigned long index, unsigned int tag);
int radix_tree_tag_get (const struct radix_tree_root *root, unsigned long index, unsigned int tag);
Real-world use — the page cache:
struct address_space { /* page cache of a file */
struct inode *host;
struct radix_tree_root page_tree; /* older kernels */
/* struct xarray i_pages; */ /* newer kernels */
};
Every file's pages are indexed by page offset. When the VFS needs page N, it looks it up in the radix tree; on a miss it reads from disk and inserts the page.
2.2 XArray
XArray (merged in 4.19) is the modern replacement for the radix tree. It wraps the same radix-tree mechanics with a cleaner API and embeds a spinlock so you don't have to manage locking separately.
Key properties:
- Automatically-resizing array of pointers indexed by
unsigned long. - Up to three tag bits per entry (get/set/clear).
- Built-in spinlock (use
xa_lock/xa_unlockfor multi-op transactions). - Iterators for present entries and for tagged entries.
#include <linux/xarray.h>
DEFINE_XARRAY(my_xa); /* static init */
/* or: */ xa_init(&my_xa); /* runtime init */
xa_store(&my_xa, index, ptr, GFP_KERNEL); /* store */
void *val = xa_load(&my_xa, index); /* load */
xa_erase(&my_xa, index); /* delete */
xa_set_tag(&my_xa, index, tag);
xa_clear_tag(&my_xa, index, tag);
bool present = xa_get_tag(&my_xa, index, tag);
/* iteration */
xa_for_each(&my_xa, index, entry) { /* ... */ }
xa_for_each_marked(&my_xa, index, entry, filter) { /* ... */ }
In Linux 6.8 the page cache uses XArray directly:
struct address_space {
struct inode *host;
struct xarray i_pages; /* xarray = radix tree + spinlock */
};
2.3 Data-Structure Comparison
| Structure | Ordered? | Key type | Best for |
|---|---|---|---|
| Red-black tree | Yes (sorted) | Any comparable | Sortable key/value pairs |
| Radix tree / XArray | By key value | unsigned long |
Sparse arrays, page cache |
| Hash table | No | Hashable | Fast point lookup, no ordering needed |
2.4 Bitmap
A bitmap is a flat array of bits packed into one or more unsigned long words.
Kernel uses bitmaps for:
- CPU online/offline masks (hot-plug).
- Allocated IRQ sets during boot.
- Free inode/block tracking in ext2/3/4.
#include <linux/bitmap.h>
DECLARE_BITMAP(my_bits, 256); /* 256-bit bitmap */
set_bit(nr, my_bits);
clear_bit(nr, my_bits);
change_bit(nr, my_bits);
bitmap_zero(my_bits, 256);
bitmap_fill(my_bits, 256);
unsigned long pos = find_first_bit(my_bits, 256);
unsigned long pos = find_first_zero_bit(my_bits, 256);
/* Iteration macros */
for_each_set_bit(bit, my_bits, 256) { /* bit is set */ }
for_each_clear_bit(bit, my_bits, 256) { /* bit is clear */ }
3. Kernel Modules
What Is a Kernel Module?
A kernel module is a chunk of kernel code (a .ko file) that can be dynamically loaded into and unloaded from a running kernel — no reboot required. Support appeared in Linux 1.2 (1995). Many subsystems (file systems, drivers, network protocols) ship as optional modules controlled by the .config flag =m.
CONFIG_KVM_GUEST=y # built into vmlinux
CONFIG_XFS_FS=m # compiled as xfs.ko
Why Use Modules?
- Faster iteration: load/unload instead of recompile + reboot.
- Lower memory footprint: modules loaded only when needed.
- No performance overhead vs built-in code.
- Easier bug isolation: disable a suspicious driver by not loading its module.
Writing a Module
Every module needs at minimum:
- An init function (called at
insmod): return 0 on success, negative error on failure. - An exit function (called at
rmmod): clean up resources. - Metadata macros:
MODULE_LICENSE,MODULE_AUTHOR,MODULE_DESCRIPTION.
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int __init lkp_init(void)
{
printk(KERN_INFO "Module loaded\n");
return 0;
}
static void __exit lkp_exit(void)
{
printk(KERN_INFO "Module unloading\n");
}
module_init(lkp_init);
module_exit(lkp_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Xiaoguang Wang <xgwang9@uic.edu>");
MODULE_DESCRIPTION("A simple kernel module");
Use static for all symbols not intentionally exported. Use EXPORT_SYMBOL(name) to expose a function or variable to other modules. All exported kernel symbols are visible in /proc/kallsyms.
Building a Module
Modules are built against the running kernel's headers using the kernel's own build system:
obj-m := lkp.o
CONFIG_MODULE_SIG=n
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
all: lkp.c
make -C $(KDIR) M=$(PWD) modules
clean:
make -C $(KDIR) M=$(PWD) clean
Result: lkp.ko. A module compiled against kernel version X will not load on version Y (the vermagic field enforces this).
Loading and Unloading
| Tool | Load | Unload | Notes |
|---|---|---|---|
insmod |
sudo insmod file.ko |
— | Low-level; no dep resolution |
rmmod |
— | sudo rmmod file |
Calls exit function |
modprobe |
sudo modprobe <name> |
sudo modprobe -r <name> |
Resolves dependencies via modules.dep |
modprobe looks for modules installed under /lib/modules/<kver>/ (put there by make modules_install). It auto-loads dependencies listed in modules.dep. Modules can be loaded at boot via /etc/modules or files under /etc/modprobe.d/.
Module Parameters
Pass arguments at load time with module_param:
static int int_param = 42;
static char *string_param = "default";
module_param(int_param, int, 0);
MODULE_PARM_DESC(int_param, "A sample integer parameter");
module_param(string_param, charp, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
MODULE_PARM_DESC(string_param, "A string parameter");
Load with: sudo insmod lkp.ko int_param=12 string_param="hello"
Parameters are also visible (and writable, if permissions allow) under /sys/module/<name>/parameters/.
Inspecting Modules
modinfo lkp.ko # show metadata: author, license, vermagic, params
lsmod # list currently loaded modules
Key Takeaways
- kmalloc gives physically contiguous memory (needed for DMA/MMIO) up to ~4 MB; use
GFP_KERNELin process context andGFP_ATOMICin interrupt/spinlock context. - vmalloc gives large, virtually contiguous allocations with no DMA guarantee; it always may sleep and allocates in page units.
- Radix trees map
unsigned longkeys to pointers with O(log n) operations and tag bits for metadata; they power the Linux page cache. - XArray is the modern, lock-embedded successor to the radix tree with a cleaner API (kernel ≥ 4.19).
- Bitmaps are compact bit arrays used for CPU masks, IRQ sets, and free-block tracking.
- Kernel modules are dynamically loadable
.kofiles;insmod/rmmodhandle a single module whilemodproberesolves dependencies automatically. - A module is compiled against a specific kernel version and will not load on a different one.
- Use
staticfor all non-exported symbols to avoid namespace pollution; useEXPORT_SYMBOLdeliberately.