Return-Oriented Programming (ROP)

Modern operating systems mark the stack and heap non-executable (the W^X / DEP / NX policy): memory that can be written cannot be executed. That one rule kills classic shellcode injection — your payload lands on the stack but the CPU refuses to run it. ret2libc was the first step past this wall: jump to an existing libc function instead of injected bytes. But ret2libc is limited to whole function calls with fixed argument counts. Return-Oriented Programming generalizes the idea to arbitrary computation by chaining tiny snippets of existing code — no injected instructions required.

Why DEP/NX is not enough

When the OS enforces W^X:

An attacker who controls the return address can no longer jump into a buffer. But the program's own .text segment — and every shared library it loads — is already marked executable and cannot be taken away. Those pages contain every instruction the attacker needs; they just have to be assembled in the right order.

From ret2libc to ROP: the key insight

In ret2libc you overwrite the return address with the address of system(), place "/bin/sh" where the argument goes, and you are done. The problem with chaining three or more calls is argument cleanup: after fun1(t) returns, its argument t is still sitting on the stack. Before fun2(u, v) can be called correctly, the stack pointer must skip past t. The slides call these gadgets — small pieces of existing code that end in ret and perform exactly this cleanup:

popl %ebp; ret       ; skip 1 word, then dispatch to next gadget
popl %ebx; popl %ebp; ret   ; skip 2 words
addl $16, %esp; ret  ; skip 4 words (3 args + saved ebp equivalent)

Each ret pops the next address off the stack and jumps there, handing control to the following gadget. The stack is no longer just data — it has become the program counter for a new virtual machine.

What is a gadget?

A gadget is a short instruction sequence that already exists in the binary (or any loaded library) and ends with a ret instruction. Two to five instructions is typical. Examples:

pop %eax; ret          ; load a value from the stack into %eax
pop %ebx; ret          ; load a value into %ebx
mov [%ebx], %eax; ret  ; write %eax to the memory address in %ebx
add %eax, %ebx; ret    ; arithmetic
xor %eax, %eax; ret    ; zero a register

Crucially, gadgets do not have to align with the original function boundaries. On x86 (a variable-length ISA), you can start decoding mid-instruction in an existing function and discover a different valid instruction sequence ending in 0xc3 (ret). This means a large binary typically contains thousands of usable gadgets.

The gadget chain on the stack

When the overflow fires, %esp points into the attacker-controlled region. The CPU's ret instruction executes %eip = *(%esp); %esp += 4 — it pops the top of the stack into the instruction pointer. If the stack contains a sequence of gadget addresses interleaved with data values, each ret hands control to the next gadget and the gadget's own pop instructions consume the accompanying data:

Stack layout (low address at bottom, %esp points here initially):

 [ &gadget_1         ]  <-- ret pops this into %eip
 [ value_for_gadget_1]  <-- gadget_1's "pop reg" consumes this
 [ &gadget_2         ]  <-- gadget_1's "ret" pops this into %eip
 [ value_for_gadget_2]
 [ &gadget_3         ]
 ...

The ROP comparison table from the slides captures this elegantly:

Concept Normal programming ROP equivalent
Instruction pointer %eip %esp (stack pointer)
No-op nop ret (advances %esp, does nothing useful)
Unconditional jump jmp address Set %esp to address of gadget
Variables Registers and memory Mostly memory (values on stack)

Building a chain: the Mem[v2] = v1 example

The slides walk through a concrete ROP operation: store value v1 at memory address v2. The desired effect Mem[v2] = v1 requires three gadgets found anywhere in the binary:

a1: pop eax; ret        -- loads v1 into %eax
a2: pop ebx; ret        -- loads v2 into %ebx
a3: mov [ebx], eax      -- writes %eax → *(%ebx)

Stack layout to execute this chain:

 [ &a1 ]   <-- initial return address (from overflow)
 [ v1  ]   <-- consumed by a1's "pop eax"
 [ &a2 ]   <-- a1's "ret" jumps here
 [ v2  ]   <-- consumed by a2's "pop ebx"
 [ &a3 ]   <-- a2's "ret" jumps here

After a3 runs, mem[v2] = v1. By assembling enough such triples you can perform arbitrary computation — move data, call functions, invoke syscalls — entirely from existing code.

Turing completeness

Hovav Shacham's 2007 paper "The Geometry of Innocent Flesh on the Bone" proved that ROP is Turing complete: any computation expressible in a normal program can be expressed as a gadget chain, as long as the binary is large enough to contain the necessary gadget types. The paper's key observation: "In any sufficiently large body of x86 executable code there will exist sufficiently many useful code sequences that an attacker who controls the stack will be able to cause the exploited program to undertake arbitrary computation."

Tools: finding gadgets automatically

Manual gadget hunting is tedious. Two widely used tools automate it:

Tool Usage
ROPgadget ROPgadget --binary ./target --ropchain scans the binary for gadgets and can automatically synthesize a chain to call execve. Pass --binary /path/to/libc.so.6 --offset <load_addr> to search a shared library.
pwntools (ROP class) from pwn import *; elf = ELF('./target'); rop = ROP(elf) — programmatically build chains by calling rop.call('system', [next(elf.search(b'/bin/sh'))]).

ROPgadget's --ropchain output is a ready-to-paste Python snippet using struct.pack to assemble the chain byte by byte.

Defenses against ROP

ROP requires the attacker to know gadget addresses and to control the stack. Defenses attack both requirements:

Defense Mechanism
Stack canary Detects stack-buffer overflow before ret executes
ASLR Randomizes load addresses — gadget addresses are unknown at exploit-write time
Control-Flow Integrity (CFI) Hardware/compiler validates every indirect transfer; ret to an unexpected address is blocked
"Return-less" code Compiler replaces ret with equivalent sequences that are not gadget terminators
Compartmentalization Limits blast radius so even a successful chain cannot reach sensitive operations

In practice, ASLR is the most commonly deployed first-line defense against ROP; CFI (e.g., Intel CET shadow stack) is the more robust hardware solution deployed in modern systems.

Key takeaways

Practice

  1. DEP/NX marks the stack and heap as non-executable. Why does this NOT stop a ret2libc or ROP attack?
  2. What defines a ROP gadget?
  3. When a ret instruction executes during a ROP chain, what two things happen to CPU state?
  4. In 32-bit cdecl calling convention, after fun1(t) returns, why can't we immediately call fun2(u, v) in a naive ret2libc chain — and what gadget type solves this?
  5. The slides compare ROP to normal programming. In that analogy, what plays the role of the instruction pointer (%eip) during a ROP chain?
  6. An attacker wants to perform the operation Mem[v2] = v1 using a ROP chain. Which sequence of three gadgets achieves this, given that v1 and v2 are values on the attacker-controlled stack?
  7. Which command-line invocation of ROPgadget scans the binary ./stack0x04 and attempts to automatically synthesize an execve ROP chain?
  8. Why does ASLR make ROP attacks harder, and why does it not make them impossible? What technique can still work against ASLR in some scenarios?
  9. Hovav Shacham's 2007 paper proved ROP is Turing complete. Explain in one paragraph what that means practically for an attacker exploiting a 32-bit Linux binary with DEP/NX enabled but no ASLR or stack canary.