Software Testing, Fuzzing & Symbolic Execution

The previous modules showed you how to exploit a stack buffer overflow once you have one. Now flip the question: how does a defender (or a security researcher) find the overflow in the first place? Programs ship with millions of lines of C/C++, and manual code review at that scale is impractical. This module covers the automated tools that are actually used to discover memory-safety bugs before attackers do — the same tools that have turned up thousands of CVEs in widely-deployed software.

Why Automated Bug Finding?

Finding a security bug takes two steps:

Locate the bug — identify the dangerous code (e.g., an unchecked memcpy).
Trigger the bug — find an input that reaches that code with the right values to cause a crash.

Manual code review handles step 1 on small programs, but becomes unreliable at scale. Step 2 is often even harder: consider if (magic == 0x31323334) { memcpy(dst, src, len); } — how do you find the exact input that makes magic equal 0x31323334? That is the central problem all three techniques below attack.

Software Testing Basics

Testing runs a program on a set of inputs and checks whether observed behavior matches a specification. Two key dimensions classify testing approaches:

	Black-Box	White-Box
Manual	Hand-crafted inputs, no source needed	Code review + manual test design
Automated	Random/generational fuzzing	Coverage-guided fuzzing, symbolic execution

Black-box testing treats the program as an opaque binary. It works even when source is unavailable (malware analysis, proprietary software) and handles any input format. Its weakness is that it misses deep paths guarded by specific checks.

White-box testing uses knowledge of the source or binary structure to create more targeted inputs. It can be far more thorough but requires access to — and analysis of — the code.

Code Coverage

How do you know your test suite is good? Code coverage measures the fraction of the program actually exercised by the tests. Common granularities:

Function coverage — were all functions called?
Statement coverage — were all statements executed?
Branch coverage — was every true/false branch taken at least once?

Coverage is given as a percentage. 100% coverage is rarely achievable in practice (unreachable code, error handlers that require hardware faults) but is often required for safety-critical systems (avionics, medical devices). For security testing, branch coverage is the most relevant: an unexercised branch might be the one hiding the overflow.

Fuzzing

Fuzzing (fuzz testing) automates the generation of test inputs and feeds them to the target program, watching for crashes, hangs, or sanitizer alerts. The key insight: crashes almost always mean a bug; if you can make the program crash, you likely found something exploitable.

Black-Box vs. Coverage-Guided

Pure random / black-box fuzzing generates entirely random bytes. It is trivially fast and requires nothing about the target, but it rarely penetrates past the first few input-validation checks.

Coverage-guided (grey-box) fuzzing — the state of the art — instruments the binary at compile time and tracks which branches each input exercises. Inputs that hit new branches are kept and mutated further; inputs that cover already-seen paths are discarded. This feedback loop lets the fuzzer learn the program's structure without doing full symbolic analysis.

AFL: American Fuzzy Lop

AFL (by Michal Zalewski) is the most influential coverage-guided fuzzer. The workflow is:

# 1. Compile with AFL instrumentation
./afl-2.52b/afl-gcc prog.c -o prog.afl

# 2. Provide a seed input
echo AAAA > input/test

# 3. Run the fuzzer
./afl-2.52b/afl-fuzz -i input -o output ./prog.afl

AFL's dashboard shows live statistics: total paths discovered, unique crashes, execution speed (commonly hundreds or thousands of executions per second), and the mutation strategies currently being tried (bit flips, byte flips, arithmetic, known-integer substitution, havoc, splice).

How AFL finds "487": Suppose a program crashes only on input starting with '4', '8', '7'. AFL starts from a seed like "AAAA". After a few hundred mutations it happens to flip the first byte to '4' — branch coverage increases, so that input is saved and queued. From that input it eventually hits '8' in position 1, extending the coverage again. Finally it reaches '7' and triggers the crash. Each step is guided by the fact that new coverage = new interesting input.

AFL++ is the modern successor: faster execution, better mutation strategies (including custom mutators), and improved instrumentation (LTO mode, COMPCOV for comparison splitting). It is the version used in practice today.

libFuzzer

libFuzzer is LLVM/Clang's in-process fuzzer. Instead of forking a new process per input, it links a fuzzing harness directly into the target and calls a LLVMFuzzerTestOneInput(data, size) function in a tight loop. This eliminates fork overhead and makes it feasible to fuzz at tens of thousands of executions per second.

libFuzzer is almost always paired with AddressSanitizer (ASan):

clang -fsanitize=address,fuzzer target.c -o target_fuzz

ASan instruments every memory access at compile time and detects:

Heap buffer overflows (reads/writes past malloc'd regions)
Stack buffer overflows
Use-after-free
Use-after-return

Without a sanitizer, many memory bugs cause silent corruption rather than an immediate crash, so the fuzzer would miss them. With ASan, even a one-byte out-of-bounds read is caught and reported with a full stack trace and the exact address. This combination — coverage-guided fuzzing + sanitizers — is how Google's OSS-Fuzz service has found thousands of bugs in open-source C/C++ projects.

Symbolic Execution

Fuzzing is fast but fundamentally probabilistic: it might never generate the exact value 0x31323334 needed to reach a deep bug. Symbolic execution attacks this problem head-on.

Instead of running the program with concrete values (e.g., x = 42), a symbolic executor assigns symbols to inputs (e.g., x = α) and propagates those symbols through the computation. At every branch, both outcomes are explored, and the constraint that would lead down each path is recorded.

Input: x = α (symbolic)

if (x > 10):          path constraint: α > 10
    if (x == 0x31323334):   path constraint: α > 10 AND α == 0x31323334
        memcpy(dst, src, len);   <-- BUG REACHABLE

An SMT solver (Satisfiability Modulo Theories — tools like Z3) is then asked: "Is this set of constraints satisfiable?" If yes, the solver hands back a concrete input that reaches the bug. This approach is complete in theory: given enough time it will find every reachable bug.

Concolic Execution

Pure symbolic execution becomes expensive quickly. Concolic execution (concrete + symbolic) is a practical hybrid:

Run the program on a real (concrete) seed input, recording the path taken.
Collect the path constraints for that run.
Negate one constraint, ask the SMT solver for a new input that takes a different branch.
Run again on the new concrete input, repeat.

The Godefroid et al. "Automated Whitebox Fuzz Testing" (NDSS 2008) paper describes this loop as the foundation of automated white-box fuzz testing.

Path Explosion

The core limitation of symbolic execution is path explosion: the number of paths through a program grows exponentially with the number of branches. Loops and recursion make this worse. Practical tools mitigate this by:

Limiting search depth
Merging similar paths (path merging)
Prioritizing paths likely to reach new code
Combining with coverage-guided fuzzing to focus only on hard-to-reach paths

Fuzzing vs. Symbolic Execution

Property	Coverage-Guided Fuzzing (AFL)	Symbolic / Concolic Execution
Speed	Very fast (millions of inputs/day)	Slow (SMT solving is expensive)
Deep path penetration	Poor for magic-value checks	Excellent — solver derives the value
Handles complex programs	Good at scale	Struggles with large programs (path explosion)
Tool maturity	Excellent (AFL, AFL++, libFuzzer)	Moderate (KLEE, angr, S2E)
Best use	Broad coverage, file parsers, network input	Targeted analysis, protocol parsing, test-case generation

In practice, the strongest pipelines combine both: fuzzing explores the easy paths quickly, while symbolic execution is invoked selectively on branches the fuzzer cannot penetrate.

Key Takeaways

Bug finding requires two steps: locate the vulnerable code and trigger it with a carefully chosen input.
Testing is classified on two axes: black-box vs. white-box (source access) and manual vs. automated (who generates inputs).
Code coverage (function / statement / branch) measures how thoroughly a test suite exercises the program; higher branch coverage means fewer untested paths.
Coverage-guided fuzzing (AFL, AFL++) instruments the binary, mutates inputs that increase branch coverage, and loops at high speed — it has found the majority of real-world memory-safety CVEs.
AddressSanitizer turns silent memory corruption into hard crashes, making fuzzing dramatically more effective.
Symbolic execution replaces concrete values with symbols, uses an SMT solver to derive inputs that reach new paths, and can crack magic-value checks that stump fuzzers — at the cost of path explosion in large programs.
Concolic execution (concrete + symbolic) is the practical hybrid: run concretely, collect constraints, negate one, solve, repeat.
Fuzzing and symbolic execution are complementary: fuzzing is fast and broad; symbolic execution is slow but precise. Modern pipelines use both.