Format-String Vulnerabilities

Buffer overflows get all the press, but format-string bugs are just as dangerous and far sneakier. A single misplaced printf(user_input) — instead of the safe printf("%s", user_input) — hands an attacker a general-purpose read/write primitive over the entire process address space. Real-world exploits have used this class of bug to leak stack canaries and ASLR bases, overwrite GOT entries, and gain shells.

Background: how printf works internally

printf is a variadic function: int printf(const char *format, ...). The ... means it accepts any number of additional arguments. Internally, it walks the format string character by character. Whenever it sees a % followed by a conversion specifier (d, s, x, n, …), it calls va_arg() to fetch the next value from the caller's stack and formats it.

The critical diagram from the slides shows the 32-bit stack layout when printf("ID: %d, Name: %s, Age: %d\n", id, name, age) is called:

Stack slot (higher → lower) Contents
age 3rd optional argument
name (pointer) 2nd optional argument
id 1st optional argument
address of format string the format parameter
return address of printf (printf's own frame below)

va_arg() simply increments the va_list pointer upward by the size of the expected type. It has no way to know when it has run out of real arguments. If the format string contains more specifiers than arguments provided, va_arg() just keeps reading whatever is on the stack above the format string pointer — data belonging to the caller's frame.

The vulnerability: user input as the format string

The bug is passing attacker-controlled data as the first argument:

// VULNERABLE
printf(user_input);           // user_input IS the format string

// SAFE
printf("%s", user_input);    // user_input is just a string to print

The same problem arises when user data is spliced into a format string before printing:

// Also vulnerable: user_input ends up embedded in the format string
sprintf(format, "%s %s", user_input, ": %d");
printf(format, program_data);

When user_input contains format specifiers like %x or %n, printf treats them as real conversion directives and acts on stack memory the attacker never supplied as an argument.

Attack 1 — Crash the program

Send a string full of %s specifiers: %s%s%s%s%s%s%s%s%s. For each %s, printf pops a value off the stack and treats it as a pointer to a string. Most of those values are not valid addresses; when printf tries to dereference one, the program crashes with a segmentation fault. This is the simplest denial-of-service.

Attack 2 — Read arbitrary stack memory

%x (or %p) prints the next stack word as a hex integer without dereferencing it. Chain them to walk up the stack:

$ ./vul
Please enter a string: %x.%x.%x.%x.%x.%x.%x.%x
63.b7fc5ac0.b7eb8309.bffff33f.11223344.252e7825.78252e78.2e78252e

The bold 11223344 in the lecture output is the local variable var = 0x11223344 from the caller's frame — leaked without touching it. The number of %x tokens needed equals the number of 4-byte stack words between the va_list starting position and the target value. Finding it is trial and error (or GDB inspection).

%s goes further: it treats the stack word as a pointer and prints the null-terminated string at that address. This leaks arbitrary memory — not just stack values. A carefully chosen pointer can dump a password buffer, a stack canary, or a libc code pointer (defeating ASLR).

Why this matters for bypassing mitigations: leaking a stack canary lets you bypass stack canary protection; leaking a code pointer (e.g., a printf@GOT entry) reveals the libc base and defeats ASLR — both are standard first steps in modern exploit chains.

Attack 3 — Write to arbitrary memory with %n

%n is the dangerous one. It writes an integer — the number of characters printed so far — into the memory address pointed to by the next va_list value:

int i;
printf("hello%n", &i);   // after this, i == 5

To exploit this, the attacker must get the target address onto the stack (e.g., by embedding it at the start of the input buffer, which is part of fmtstr()'s frame):

$ echo $(printf "\x04\xf3\xff\xbf").%x.%x.%x.%x.%x.%n > input

Breaking this down:

The value written equals the total number of characters printed before %n. You can control it precisely using field-width padding: %100x prints the integer using at least 100 characters, so combining padding with %n lets you write any small integer you choose.

Targeting the GOT

The Global Offset Table (GOT) holds run-time addresses of shared-library functions. If you overwrite the GOT entry for exit (or any function called after your payload) with the address of system() or your shellcode, the next call to that function redirects to your target. Because the GOT is at a fixed, known address (without ASLR, or after leaking it), it is the classic %n target for control-flow hijacking via format strings.

Format specifier quick reference

Specifier Action Attack use
%x / %p Print next stack word as hex Leak stack contents
%s Dereference stack word as char * Leak arbitrary memory
%n Write character count to pointed address Arbitrary memory write
%k$x Direct parameter access — skip to the k-th argument Target a specific stack slot precisely
%Nx Print with field width N (pads output) Control the value written by %n

Direct parameter access (%k$x syntax) lets you skip straight to the k-th argument without burning multiple %x tokens. For example, %5$x reads the 5th argument directly. Combined with %5$n, you can write to the address held in the 5th stack slot in a single, compact payload.

Defenses

Defense How it helps
Always use printf("%s", user_input) User data is never parsed as a format string
Compiler -Wformat-security / -Wformat=2 Warns (or errors) when a non-literal format string is passed to printf-family functions
FORTIFY_SOURCE Replaces printf internals with bounds-checked versions that detect missing arguments
Full RELRO Makes the GOT read-only at load time, blocking %n-based GOT overwrites
ASLR + PIE Forces attackers to leak addresses before writing — raises the bar but does not eliminate the bug

The most important defense is the simplest: never pass user-controlled data as the format argument. All other defenses are mitigations that reduce exploitability after the bug is introduced.

Key takeaways

Practice

  1. Which of the following calls introduces a format-string vulnerability?
  2. When printf processes %x and there are no real arguments left, what happens?
  3. An attacker sends the input %s%s%s%s%s%s%s%s to a vulnerable printf(input) call. What is the most likely outcome?
  4. What does the %n format specifier do when printf processes it?
  5. In the lecture's Attack 2 demo, the format string %x.%x.%x.%x.%x.%x.%x.%x is given as input to a vulnerable program. The output reveals 11223344 at the 5th position. What does this tell the attacker?
  6. To write the value 0x64 (decimal 100) to a target address using %n, how can the attacker control the exact value written?
  7. Why is overwriting a GOT (Global Offset Table) entry a popular goal for format-string exploits?
  8. Which compiler/linker flag causes the compiler to emit a warning (or error) when a non-literal string is passed as the format argument to printf-family functions?
  9. Explain how a format-string read (%x or %s) can be used to bypass both a stack canary and ASLR before attempting a follow-up overflow or %n write. What is read in each case, and why does that information defeat the corresponding mitigation?