AES, Encryption Modes & Hash Functions

Every time you connect to a website over HTTPS, buy something with a credit card, or unlock your phone, symmetric encryption and hashing are working behind the scenes. This module covers the two most important primitives: AES, the world-standard block cipher, and cryptographic hash functions, the workhorses of integrity and authentication. Getting either one wrong β€” using the wrong mode, reusing an IV, or storing passwords with MD5 β€” is a routine source of real-world vulnerabilities.

Background: block ciphers vs. stream ciphers

A block cipher partitions plaintext into fixed-size chunks and encrypts each chunk with a key. A stream cipher generates a pseudorandom key stream from a short key and XORs it with the plaintext byte-by-byte. AES is a block cipher; some modes of operation (CFB, CTR) turn it into a stream cipher.

AES: the Advanced Encryption Standard

DES (Data Encryption Standard, 1977) had a 56-bit key and 64-bit blocks; brute-force attacks succeeded in the 1990s. In 1997 NIST held an open competition for a replacement. The winner β€” Rijndael, designed by Daemen and Rijmen β€” became AES in 2001.

Key facts:

Property Value
Block size 128 bits (16 bytes) β€” always
Key sizes 128, 192, or 256 bits
Rounds 10 (128-bit key), 12 (192-bit), 14 (256-bit)
Structure Substitution-permutation network (not Feistel)

Each round applies four transformations to a 4Γ—4 byte state matrix:

  1. SubBytes β€” every byte is replaced by its entry in a fixed S-box (non-linear substitution that resists algebraic attacks).
  2. ShiftRows β€” rows of the state matrix are cyclically shifted left by 0, 1, 2, and 3 positions, spreading bytes across columns.
  3. MixColumns β€” each column is treated as a polynomial and multiplied modulo a fixed polynomial, achieving diffusion (one byte change affects the whole column).
  4. AddRoundKey β€” the state is XORed with a 128-bit round key derived from the original key via the key schedule.

The final round omits MixColumns. Decryption runs all steps in reverse with inverse operations. You do not need to remember the math; what matters is that these four steps together ensure that a single-bit change in plaintext or key scrambles every output bit β€” the avalanche effect.

Encryption modes: handling multi-block messages

AES encrypts exactly one 16-byte block at a time. Real messages are longer. Cipher modes define how to chain block-cipher calls over a sequence of blocks.

ECB β€” Electronic Codebook (broken for most uses)

The simplest mode: encrypt each block independently.

C_i = E_K(P_i)
P1 ──► [AES] ──► C1
P2 ──► [AES] ──► C2      (same key every block)
P3 ──► [AES] ──► C3

Fatal flaw: identical plaintext blocks produce identical ciphertext blocks. Encrypt a bitmap image in ECB mode and the outline of the original image remains visible in the ciphertext β€” the famous "ECB penguin." ECB is essentially a lookup table (codebook), and any structure in the plaintext bleeds straight through into the ciphertext. Never use ECB for real data.

CBC β€” Cipher Block Chaining (the classic choice)

Each plaintext block is XORed with the previous ciphertext block before encryption. The first block uses an Initialization Vector (IV) in place of a prior ciphertext block.

C_i = E_K(P_i XOR C_{i-1})    where C_0 = IV
IV ──►(XOR)──► [AES] ──► C1
P1 ──►↑         ↓
               C1 ──►(XOR)──► [AES] ──► C2
        P2 ──►↑        ...

Identical plaintext blocks now produce different ciphertext blocks (because the chained ciphertext differs). Note:

CFB β€” Cipher Feedback

CFB encrypts the previous ciphertext block with AES and XORs the result with the current plaintext β€” effectively turning AES into a stream cipher.

CTR β€” Counter Mode

A counter value (nonce + block number) is encrypted with AES to produce a key-stream block, which is then XORed with the plaintext. This is a stream-cipher construction.

GCM β€” Galois/Counter Mode (authenticated encryption)

CTR mode extended with a Galois-field authentication tag. GCM provides both confidentiality and integrity in one pass. It is the standard choice in TLS 1.3. (Not covered in depth in this lecture, but know it exists and why it matters.)

Mode comparison

Mode Padding needed Encryption parallel Decryption parallel Hides patterns
ECB Yes Yes Yes No
CBC Yes No Yes Yes
CFB No No Yes Yes
CTR No Yes Yes Yes

Padding: PKCS#7

Block modes that require padding (ECB, CBC) must expand the last block to a full 16 bytes. PKCS#7 (also called PKCS#5 in many library docs) pads with N bytes each of value N. If the input is 9 bytes, 7 bytes of 0x07 are appended:

Original:  31 32 33 34 35 36 37 38 39
Padded:    31 32 33 34 35 36 37 38 39  07 07 07 07 07 07 07

The value of each padding byte tells the decryptor how many bytes to strip. If the last block is already full, a whole extra block of 0x10 (16) is added so the rule is unambiguous.

Stream-mode derivatives (CFB, CTR) produce output the same length as the input β€” no padding.

IV requirements and pitfalls

The IV must be:

Generate IVs with a cryptographically secure random number generator; never use a counter or timestamp as an IV in CBC.

Cryptographic hash functions

A hash function maps arbitrary-length input to a fixed-length output (the digest). A cryptographic hash function adds three security properties:

Property Meaning
One-wayness (preimage resistance) Given h, it is infeasible to find any m such that hash(m) = h
Second-preimage resistance Given m, it is infeasible to find m' β‰  m with the same hash
Collision resistance It is infeasible to find any pair m1 β‰  m2 with hash(m1) = hash(m2)

Note: collision resistance implies second-preimage resistance, but not the other way around. A hash that is merely one-way is not necessarily collision-resistant.

Merkle-Damgard construction

MD5, SHA-1, and SHA-2 are built on the Merkle-Damgard construction: the message is split into fixed-size chunks, padded, and fed through a compression function iteratively:

IV ──►[h]──► intermediate ──►[h]──► ... ──►[h]──► Hash(M)
        ↑                      ↑
        M1                     M2 || ... || Mn || padding

This construction has a known weakness β€” length-extension attacks β€” which is why you should not use hash(key || message) as a MAC. Use HMAC instead.

The MD family

Designed by Ron Rivest (the R in RSA):

Algorithm Status
MD2, MD4 Severely broken β€” obsolete
MD5 Collision resistance broken (practical collisions found); one-wayness is technically intact but MD5 is not safe for any security purpose
MD6 Developed in response to NIST's SHA-3 competition

The SHA family

Published by NIST:

Algorithm Status
SHA-0 Withdrawn due to a design flaw shortly after publication
SHA-1 Collision attack demonstrated in 2017 (SHAttered); deprecated β€” do not use
SHA-2 (SHA-256, SHA-512, truncated variants) Currently secure; the standard for most applications
SHA-3 (Keccak) Released 2015; different sponge construction; no significant attack

SHA-256 (32-byte digest) and SHA-512 (64-byte digest) are the workhorses of SHA-2. Use them for new systems.

Applications of hash functions

Integrity verification

A single-bit change in the input completely changes the hash (avalanche effect). Distributors publish SHA-256 checksums of software downloads; users verify the download before trusting it. System-integrity tools like Tripwire use hashes to detect tampered files.

Password storage

Passwords must not be stored in plaintext. The correct pattern:

  1. Generate a per-user random salt (e.g., 16 bytes).
  2. Compute hash = KDF(password || salt) using a slow key-derivation function (bcrypt, scrypt, Argon2) or at minimum many rounds of SHA-512.
  3. Store (algorithm_id, salt, hash) β€” never the password itself.

Linux stores this in /etc/shadow. An entry like $6$wDRrWCQz$... means: algorithm 6 (SHA-512), salt wDRrWCQz, then the hash.

Why salt? Without a salt, two users with the same password produce the same hash. An attacker who obtains the database can crack all identical entries at once with a single dictionary lookup (rainbow table). Unique per-user salts force the attacker to crack each entry independently.

Commitment schemes

Because hash(m) reveals nothing about m (one-wayness), you can publish a hash to commit to a secret without disclosing it. Later, revealing m lets anyone verify the commitment β€” useful in protocols, auctions, and blockchain applications.

MACs vs. bare hashes

A hash alone does not authenticate the sender (anyone can compute a hash). A Message Authentication Code (MAC) such as HMAC combines the hash with a shared secret key. Use HMAC when you need both integrity and authenticity.

Key takeaways

Practice

  1. What is the block size of AES regardless of the key length chosen?
  2. Which of the following describes why ECB (Electronic Codebook) mode is considered insecure for most real-world data?
  3. In CBC mode, a 21-byte plaintext is encrypted with AES-128. How large is the resulting ciphertext?
  4. An application encrypts a 9-byte string 123456789 with AES-128-CBC. With PKCS#7 padding, which byte value is used for padding, and how many padding bytes are added?
  5. Which of the following statements about Initialization Vectors (IVs) in CBC mode is correct?
  6. A developer stores user passwords as SHA-256(password) without any additional data. An attacker steals the database. Which attack does the lack of a salt directly enable?
  7. Which of the following correctly describes the status of MD5 and SHA-1 today?
  8. Explain the difference between CBC mode and CFB mode in terms of: (a) whether padding is required, (b) whether encryption can be parallelized, and (c) what kind of cipher each mode effectively produces.
  9. Linux stores passwords in /etc/shadow in the format $6$<salt>$<hash>. What does each field mean, why is the salt stored alongside the hash, and why does Linux apply multiple rounds of the hash function?