Side Channels, CPU Bugs & OS Security

Your algorithm might be provably correct — and still leak your secret. In 2018, the Meltdown and Spectre disclosures showed that billions of CPUs were leaking kernel memory to ordinary user programs, not through any software bug, but through the timing of their own caches. Understanding why requires three interlocking ideas: what a side channel is, how CPU caches become measurement instruments, and how modern CPUs speculatively execute across security boundaries.

What Is a Side Channel?

A side channel is any observable artifact of a computation other than its output that correlates with secret input. The algorithm itself may be perfectly correct; the leak happens in the physical or microarchitectural world alongside it.

Category	Observable	Example attack
Consumption — timing	How long an operation takes	Early-exit password compare
Consumption — power	How many transistors switch	Simple/differential power analysis on AES/RSA
Consumption — network	Packet sizes and inter-arrival times	SSH keystroke timing (Song et al.)
Emission — acoustic	Sound produced by CPU/keyboard	Keystroke classification from mic audio
Emission — EM	Electromagnetic radiation	TEMPEST-class attacks
Emission — error messages	Which branch was taken	Oracle padding attacks

Timing Attack: The Early-Exit Password Comparison

// VULNERABLE: exits as soon as a byte differs
bool password_check(const char *input, const char *stored) {
    for (int i = 0; stored[i] != '\0'; i++) {
        if (input[i] != stored[i])
            return false;   // leaks how many bytes matched
    }
    return true;
}

A longer execution time means more bytes matched. An attacker submits candidates one byte at a time, observing timing, and recovers the password character by character.

Fix — constant-time comparison:

int constant_time_cmp(const unsigned char *a, const unsigned char *b, size_t len) {
    unsigned char result = 0;
    for (size_t i = 0; i < len; i++) {
        result |= a[i] ^ b[i];   // accumulate differences; never early-exit
    }
    return result == 0;           // always takes the same number of iterations
}

All crypto libraries (OpenSSL, libsodium, etc.) use this pattern. The key property: execution time is independent of secret values.

CPU Cache Side Channels

The Cache Hierarchy

Modern CPUs cannot serve every memory access directly from DRAM (hundreds of nanoseconds). Instead, they maintain a hierarchy of smaller, faster caches (L1 ~4 ns, L2 ~12 ns, L3 ~40 ns, DRAM ~100 ns). The cache holds 64-byte cache lines — the unit of transfer. When the CPU reads an address, it checks each cache level before going to DRAM.

The security consequence: whether a cache line is present (a hit) or absent (a miss) is observable through timing, even by unprivileged code using the rdtsc/rdtscp timestamp counter instruction.

Cache hit  → ~4–40 CPU cycles
Cache miss → ~200–400 CPU cycles

An adversary who can observe this timing learns whether a particular address was recently accessed — even if they cannot read the value stored there.

Flush+Reload

Flush+Reload is a cache side-channel technique that works when attacker and victim share a memory-mapped region (e.g., a shared library):

Flush — _mm_clflush(addr) evicts all cache lines for the probe array.
Victim executes — the victim accesses array[secret * 4096 + DELTA], pulling that specific line into cache.
Reload — the attacker times access to each array[i * 4096]. The entry with a fast access time (cache hit) reveals the secret value i.

// Attacker's reload phase
for (int i = 0; i < N; i++) {
    addr  = &array[i * 4096 + DELTA];
    t1    = __rdtscp(&junk);
    junk  = *addr;
    time  = __rdtscp(&junk) - t1;
    if (time <= CACHE_HIT_THRESHOLD)
        printf("Secret = %d\n", i);
}

The stride of 4096 (one page) ensures each probe entry maps to a different cache set, preventing accidental eviction of adjacent entries from contaminating the measurement.

Prime+Probe

When attacker and victim do not share memory, Prime+Probe works on the shared LLC (Last-Level Cache):

Prime — attacker fills a cache set with its own data.
Victim executes — a secret-dependent access may evict one of the attacker's lines.
Probe — attacker re-reads its own data; a slow read indicates its line was evicted, revealing which cache set (and therefore which memory address range) the victim accessed.

Speculative Execution and the Spectre/Meltdown Attacks

Modern CPUs are out-of-order and speculative: they execute instructions ahead of time while waiting for memory fetches or branch resolution, then either commit or discard the results. Architectural state (registers, memory) is rolled back on a mis-speculation — but microarchitectural state (cache contents) is not.

Meltdown — Breaking the Kernel/User Boundary

The kernel maps its own memory into every process's virtual address space (for fast system-call handling) but marks those pages as supervisor-only in the page table. Architecturally, a user-mode read of a kernel address should fault immediately.

With out-of-order execution, the CPU races: it starts executing instructions that use the kernel value before the permission check completes:

1  kernel_data = *(char*)0xfb61b000;  // will fault — but OOO executes line 2 first
2  array[kernel_data * 4096 + DELTA] += 1;  // runs speculatively, pollutes cache
   // permission check fires → exception → architectural state rolled back
   // but cache line for array[kernel_data*4096] remains hot!

The attacker then does a Flush+Reload pass over array to identify which index is in cache — and that index is the secret kernel byte.

Key insight: Meltdown exploits a race between out-of-order execution and an access-permission check. The CPU discards the register result but cannot un-warm the cache.

Mitigation — Kernel Page Table Isolation (KPTI): The OS maintains two separate page tables per process. In user mode, the page table maps almost no kernel memory (only the tiny syscall trampolines required by x86). Because kernel addresses are not mapped in user space, they cannot be resolved even speculatively.

Spectre v1 — Abusing Branch Prediction

Spectre is subtler. The CPU's branch predictor learns from past branches to speculatively execute the most likely path. An attacker can train the predictor, then trick it into speculatively executing across a software bounds check:

// Victim code (e.g., in a kernel or sandbox)
uint8_t restrictedAccess(size_t x) {
    if (x < buffer_size) {       // bounds check (access protection)
        return buffer[x];
    }
    return 0;
}

Attack steps:

Train — call restrictedAccess(valid_x) many times so the branch predictor learns "condition is true."
Flush — evict buffer_size from cache so the check stalls waiting for DRAM.
Attack — call restrictedAccess(larger_x) where larger_x points into the secret region. The predictor speculatively takes the true branch, loading buffer[larger_x] (a secret byte) and using it to index a probe array before the actual condition resolves to false.
Reload — Flush+Reload on the probe array recovers the secret.

Spectre vs. Meltdown — the key distinction:

	Meltdown	Spectre v1
Root cause	Race between OOO execution and permission check	Mistraining the branch predictor
Boundary crossed	Kernel/user (page-table permission)	Software bounds check
Mitigation	KPTI (OS software fix)	No easy fix; requires compiler/ISA changes (lfence, retpoline, constant-time masking)
Affected hardware	Primarily Intel (older); fixed in new silicon	Intel, AMD, ARM — fundamentally hard to fix in hardware

Operating System Security Fundamentals

The attacks above highlight why the OS must be a trustworthy enforcer of boundaries.

CPU Privilege Modes

The CPU provides at least two modes:

Kernel (system/privileged) mode — can execute any instruction, access any memory location, modify page tables, enable/disable interrupts.
User mode — restricted; cannot directly touch hardware, cannot disable interrupts, cannot access memory outside its page-table mapping.

The only legal transition from user to kernel mode is through a system call — a well-defined gate (syscall instruction on x86-64) that transfers control to a fixed kernel entry point and simultaneously validates the processor state. Superuser (root) privileges are a software-level notion, separate from the hardware kernel-mode right.

Memory Protection

The OS enforces isolation between processes through the page table: each process has its own virtual address space. Physical memory is shared, but the mapping is private. Mechanisms include:

Base/bounds registers (historical) — simple start/end limits.
Segmentation — per-segment descriptors with privilege levels.
Paging — modern approach; each 4 KB page has independent read/write/execute/user permissions. The MMU checks these on every access; a violation raises a fault before the instruction completes (except when Meltdown races the check).

OS Security Goals and Hardening

The OS has two primary security goals:

Enable multiple users to securely share resources — isolation of processes, memory, files, and devices via virtual memory, containers (namespaces + cgroups), and VMs (hypervisor).
Ensure secure networked operation — authentication, access control, encrypted communication, logging/auditing, intrusion detection, and recovery.

OS hardening is the practice of configuring a deployed OS to minimize attack surface: removing unnecessary services, configuring users/groups correctly, enforcing least-privilege permissions, installing IDS/anti-virus, and keeping the system patched.

Logging and auditing record events (login attempts, permission changes, network connections, system calls) at the application, system, and user levels. Audit trails must themselves be protected — restricted access, off-system backups, write-once storage — so an attacker cannot erase evidence.

Virtualization security: A hypervisor (VMM) runs multiple guest OSes with strong isolation, enabling hypervisor-based rootkit/virus detection, security-sensitive application isolation, and transparent live patching.

Key Takeaways

A side channel leaks secret-dependent information through observable physical or microarchitectural artifacts — not through the algorithm's output.
Cache timing (hit ~4 ns vs. miss ~200 ns) is observable by unprivileged code; Flush+Reload and Prime+Probe exploit this to infer which memory addresses were accessed.
Meltdown exploits the race between out-of-order execution and a page-table permission check, letting user code transiently read kernel memory; KPTI eliminates kernel mappings from user-mode page tables.
Spectre v1 trains the CPU's branch predictor to speculatively bypass a software bounds check, leaking memory into cache; it is fundamentally harder to mitigate because the flaw is in the prediction mechanism itself.
Both attacks are hardware vulnerabilities affecting AMD, Intel, and ARM; full fixes require new silicon.
Constant-time code is essential for crypto: comparisons, key lookups, and table accesses must take the same time regardless of secret values.
The OS enforces isolation through kernel/user CPU modes and page tables; system calls are the only legal gateway between them.