Stream and block ciphers give us a way to guarantee the confidentiality of data, but how do we guarantee its integrity?
Integrity means that an attacker should not be able to modify data
The ciphers we've discussed produce malleable ciphertexts: an attacker can modify one of these ciphertexts and have it produce a predictable output.
Source: The Imitation Game
Our PRG-based construction for symmetric encryption with a secret key k is to run
Encrypt(m,k)=G(k)⊕m
where G(⋅) is a pseudo-random generator (PRG) and ⊕ = XOR.
What happens if an attacker knows m and intercepts the message?
Answer: they can compute G(k)!
Since we didn't do anything to authenticate the message, the attacker can now encrypt their own message and forward it on to the receiver.
A message authentication code (MAC) is an algorithm that generates a few bytes of data known as a tag. This tag can be used to verify the authenticity of a message.
Poly1305 is a MAC algorithm that takes a 32-byte key and generates a 16-byte tag.
It's called Poly1305 because it evaluates a polynomial in the ring Z/(2130−5)Z (but you don't need to know that to use it in practice):
p(r)=(c1rq+c2rq−1+…+cqr)mod2130−5
Hash-based MAC (HMAC) is a convenient method of turning a cryptographic hash function like SHA-256 into a MAC.
You can think of HMAC as a keyed hash function; it's a hash function h(x,k) that takes some data x and a key k, and outputs a hash.
SipHash: does not provide sufficient guarantees to be used as a cryptographic hash function, but is good enough to be used by HMAC.
In exchange for its weaker security guarantees, SipHash is extremely efficient. This makes it useful for preventing hash flooding attacks.
Example: in Python, you can use a dictionary to perform key-value lookups.
>>> headers = {}
>>> headers["Content-Type"] = "application/json"
>>> headers["Host"] = "www.example.org"
...
A dict
is just a hash table. What are some security properties that you'd want
the hash function used by dict
to have?
What if an adversary knows what hashes you're going to compute, and can get you to insert keys that all hash to the same value?
Source: Jean-Philippe Aumasson
Source: Jean-Philippe Aumasson
Source: Jean-Philippe Aumasson
Source: Jean-Philippe Aumasson
An attacker can use this to DoS a server by forcing it to perform increasingly expensive key lookups.
As these lookups get more expensive, the server has less time to respond to legitimate requests.
Solution: generate a random secret key, and then use SipHash! An attacker who doesn't know the key can't compute the hash.
Python uses SipHash to hash strings and bytes for its dictionaries.
The Linux kernel uses SipHash internally for its hash tables
An authenticated encryption with additional data (AEAD) algorithm is an encryption scheme that takes three inputs:
and produces two outputs: a ciphertext (containing the encryption of the plaintext) and a MAC.
The decryption algorithm takes the ciphertext, MAC, and header and produces the plaintext. It also checks whether the header or ciphertext have been modified.
The "associated data" is not encrypted, but we still check its integrity. This is useful when we have message metadata that we want others to be able to see.
ChaCha20-Poly1305 is an authenticated encryption scheme that uses the ChaCha20 stream cipher with the Poly1305 MAC.
Some notable uses:
Source: age / Filippo Valsorda
Source: Wireguard / Jason Donenfeld
>>> import os >>> from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305 >>> data = b"a secret message" >>> aad = b"authenticated but unencrypted data" >>> key = ChaCha20Poly1305.generate_key() >>> chacha = ChaCha20Poly1305(key) >>> nonce = os.urandom(12) >>> ct = chacha.encrypt(nonce, data, aad) >>> chacha.decrypt(nonce, ct, aad) b'a secret message'
>>> import os >>> from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305 >>> data = b"a secret message" >>> aad = b"authenticated but unencrypted data" >>> key = ChaCha20Poly1305.generate_key() >>> chacha = ChaCha20Poly1305(key) >>> nonce = os.urandom(12) >>> ct = chacha.encrypt(nonce, data, aad) >>> chacha.decrypt(nonce, ct, aad) b'a secret message'
>>> import os >>> from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305 >>> data = b"a secret message" >>> aad = b"authenticated but unencrypted data" >>> key = ChaCha20Poly1305.generate_key() >>> chacha = ChaCha20Poly1305(key) >>> nonce = os.urandom(12) >>> ct = chacha.encrypt(nonce, data, aad) >>> chacha.decrypt(nonce, ct, aad) b'a secret message'
>>> import os >>> from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305 >>> data = b"a secret message" >>> aad = b"authenticated but unencrypted data" >>> key = ChaCha20Poly1305.generate_key() >>> chacha = ChaCha20Poly1305(key) >>> nonce = os.urandom(12) >>> ct = chacha.encrypt(nonce, data, aad) >>> chacha.decrypt(nonce, ct, aad) b'a secret message'
>>> import os >>> from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305 >>> data = b"a secret message" >>> aad = b"authenticated but unencrypted data" >>> key = ChaCha20Poly1305.generate_key() >>> chacha = ChaCha20Poly1305(key) >>> nonce = os.urandom(12) >>> ct = chacha.encrypt(nonce, data, aad) >>> chacha.decrypt(nonce, ct, aad) b'a secret message'
AES-GCM (Galois/Counter Mode) combines CTR-mode AES with the GHASH hash function to provide message authentication on top of confidentiality.
There is also a variant of AES, AES-GCM-SIV (SIV = "Synthetic Initialization Vector"), which provideds nonce misuse resistance in addition.
(I.e., you can reuse a nonce without it blowing up in your face)
Never use a cipher like AES or ChaCha by itself! Any messages you send are malleable and cannot be authenticated.
Message authentication codes (MACs) are algorithms that generate a tag that can be used to determine whether a message has been tampered. This tag is usually computed on the ciphertext and appended.
In practice, you should use an authenticated encryption / AEAD schemes like AES-GCM and ChaCha20-Poly1305, which provide both confidentiality and integrity.