Kernel Data Structures I: Lists, Hash Tables, and Red-Black Trees
Why This Matters
As Linus Torvalds put it: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." The Linux kernel is a masterclass in this philosophy. Before you can read kernel source comfortably, you must understand a handful of generic data structures that appear everywhere — in the scheduler, the VFS, device drivers, and networking. This module covers the three most common: the linked list, the hash table, and the red-black tree.
1. The Linux Linked List
The Classic Approach vs. the Kernel Approach
In CS 101 you learn to embed prev/next pointers directly in each node type:
struct car {
struct car *prev;
struct car *next;
unsigned int max_speed;
unsigned int price_in_dollars;
};
This works, but you'd need separate list-manipulation functions for every struct type. The kernel solves this with inversion: instead of putting pointers in your struct, you embed a generic list_head node inside your struct, and all list APIs operate on list_head alone.
struct list_head { /* include/linux/types.h */
struct list_head *next, *prev;
};
struct car {
struct list_head list; /* embedded list node */
unsigned int max_speed;
unsigned int price_in_dollars;
};
Two Key Design Choices
| Property | Typical doubly linked list | Linux list_head |
|---|---|---|
| Termination | NULL |
Loops back to HEAD |
| HEAD node | Contains data | Sentinel — no data |
| Traversal end | ptr == NULL |
ptr == HEAD |
| Empty test | HEAD == NULL |
HEAD->next == HEAD |
The sentinel HEAD means the list is never NULL when empty — the head node's prev and next both point back to itself. This eliminates a whole class of null-pointer bugs.
Getting Back to Your Data: container_of
Since APIs only hand you a list_head *, you need to recover the enclosing struct car *. The list_entry macro (an alias for container_of) does this with pointer arithmetic:
struct car *my_car = list_entry(car_list_ptr, struct car, list);
Under the hood:
#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
(type *)((char *)__mptr - offsetof(type, member)); })
It subtracts the known offset of list within struct car from the list_head address, giving the start of the parent struct.
Defining a List
Static (compile-time):
struct car my_car = {
.max_speed = 150,
.price_in_dollars = 10000,
.list = LIST_HEAD_INIT(my_car.list),
};
LIST_HEAD(my_car_list);
Dynamic (runtime — most common in the kernel):
struct car *my_car = kmalloc(sizeof(*my_car), GFP_KERNEL);
my_car->max_speed = 150;
my_car->price_in_dollars = 10000;
INIT_LIST_HEAD(&my_car->list);
Core API (all O(1))
list_add(new, head); /* insert after head (stack-like) */
list_add_tail(new, head); /* insert before head (queue-like) */
list_del(entry); /* unlink — memory not freed */
list_move(list, head); /* move to another list's head */
list_move_tail(list, head); /* move to another list's tail */
list_empty(head); /* returns non-zero if empty */
list_splice(list, head); /* merge two lists */
Iteration (O(n))
/* Using a raw list_head cursor */
struct list_head *p;
struct car *current_car;
list_for_each(p, &my_car_list) {
current_car = list_entry(p, struct car, list);
printk(KERN_INFO "Price: %u\n", current_car->price_in_dollars);
}
/* Simpler: list_for_each_entry handles the list_entry call */
list_for_each_entry(current_car, &my_car_list, list) {
printk(KERN_INFO "Price: %u\n", current_car->price_in_dollars);
}
Safe iteration while deleting — you must use the _safe variants, which pre-fetch the next pointer before you can corrupt it:
struct car *current_car, *next;
list_for_each_entry_safe(current_car, next, &my_car_list, list) {
list_del(¤t_car->list);
kfree(current_car);
}
Without _safe, deleting a node invalidates the iterator and causes a use-after-free bug.
Where Linked Lists Appear in the Kernel
- Thread group list under a parent PID
- Superblock list for a filesystem
- Network socket queues
- And dozens more
2. The Linux Hash Table
Structure
The kernel provides a fixed-size open chaining (separate chaining) hash table. Key characteristics:
- Bucket array size is always a power of two (2^N), set at initialization
- Each bucket is an
hlist_headpointing to a singly linked collision chain ofhlist_nodes - Lookup time: O(1) average
/* include/linux/hashtable.h, types.h */
struct hlist_head {
struct hlist_node *first;
};
struct hlist_node {
struct hlist_node *next;
struct hlist_node **pprev; /* points to &prev->next */
};
The pprev double-pointer trick lets you delete a node in O(1) without knowing the head, even though the list is singly linked — you modify *pprev directly.
Like list_head, hlist_node is embedded inside your data struct, and container_of recovers the parent.
API
/* Define and initialize a table with 2^bits buckets */
DEFINE_HASHTABLE(my_table, 8); /* 256 buckets */
hash_init(my_table);
/* Add an object (hash key determines bucket) */
hash_add(my_table, &obj->node, key);
/* Iterate every entry across all buckets */
hash_for_each(my_table, bkt, obj, node);
/* Iterate entries that hash to the same bucket as key */
hash_for_each_possible(my_table, obj, node, key);
/* Remove an entry */
hash_del(&obj->node);
Hash tables are used heavily in networking (connection tracking), device drivers, and filesystems.
3. The Red-Black Tree (rbtree)
Background
A red-black tree is a self-balancing binary search tree. Each node is colored red or black, and the tree enforces invariants that keep all paths from root to leaf within a factor of two of each other. This guarantees O(log N) search, insert, and delete.
Key properties maintained after every modification:
- Every path from a node to its descendant leaves contains the same number of black nodes.
- Red nodes cannot have red children.
Linux rbtree
/* include/linux/rbtree.h */
struct rb_node {
unsigned long __rb_parent_color; /* parent ptr + color packed together */
struct rb_node *rb_right;
struct rb_node *rb_left;
};
struct rb_root {
struct rb_node *rb_node;
};
#define RB_ROOT (struct rb_root) { NULL, }
Color and parent pointer are packed into a single unsigned long using the two low-order bits (always zero in aligned pointers) — a classic kernel space-saving trick.
API
/* Recover enclosing struct */
rb_entry(ptr, type, member); /* same as container_of */
/* Navigation */
rb_first(root); rb_last(root);
rb_next(node); rb_prev(node);
/* Insert (caller supplies a comparison function) */
rb_add(node, tree, less_fn);
/* Search (caller supplies a key comparison function) */
rb_find(key, tree, cmp_fn);
The kernel does not provide a built-in search that knows your key type — you supply a comparator. This is the "toolbox" pattern explained below.
Real-World Usage: The CFS Scheduler
The Completely Fair Scheduler (CFS), Linux's default process scheduler, maintains a red-black tree keyed on each task's vruntime (virtual runtime — how much CPU time the task has consumed). The scheduler always picks the leftmost node (smallest vruntime) as the next task to run, which it can find in O(log N). This is one of the most performance-critical uses of rbtree in the entire kernel.
4. Design Patterns Across Kernel Data Structures
All three structures share the same idioms:
Embedded Nodes, Not External Containers
Each structure (list_head, hlist_node, rb_node) is embedded inside the application data struct, not the other way around. Benefits:
- All list/tree APIs are type-agnostic; they work on any struct.
- You control the field layout, so you can pack hot fields together for cache efficiency.
- A single struct can participate in multiple lists simultaneously by embedding multiple
list_headfields. container_of/list_entry/rb_entryrecover the parent struct from any node pointer.
A Toolbox, Not a Complete Solution
None of the three structures provides a built-in "find by key" function. You build your own search loop using the primitives. This keeps the kernel's generic code minimal and forces callers to write exactly the logic they need.
Caller-Managed Locking
The data structures themselves are not thread-safe. The caller is responsible for acquiring the appropriate lock (spinlock, mutex, RCU, etc.) before calling any API. This separation of concerns keeps the generic code fast and simple.
Key Takeaways
- Linux's linked list is a circular doubly linked list with a sentinel HEAD and the node embedded in the data struct — opposite of the textbook design.
container_of/list_entry/rb_entryuse compile-timeoffsetofto recover a parent struct from an embedded node pointer.- Always use
list_for_each_entry_safewhen deleting elements during iteration to avoid use-after-free. - The kernel hash table uses
hlist_head/hlist_nodewith power-of-two bucket counts;pprevenables O(1) deletion from a singly linked chain. - Red-black trees provide O(log N) operations; CFS uses one to track per-task vruntime and always pick the smallest.
- All three structures follow the same design philosophy: embedded nodes, type-agnostic APIs, caller-owned locking, and no built-in search.