CS 3710: Intro to Cybersecurity (slides): Cryptography: authenticated encryption

# Cryptography: authenticated encryption
## CS 3710: Intro to Cybersecurity

===

## Authenticated encryption

---

## MACs and authenticated encryption

Stream and block ciphers give us a way to guarantee the *confidentiality* of
data, but how do we guarantee its integrity?

_**Integrity**_ means that an attacker should not be able to modify data

---

## Why integrity matters: malleability

The ciphers we've discussed produce _**malleable**_ ciphertexts: an attacker can
modify one of these ciphertexts and have it produce a predictable output.

---

## Malleability

*Source: The Imitation Game*

</figcaption>
</figure>

---

## Malleability

Our PRG-based construction for symmetric encryption with a secret key $k$ is to
run

$$
Encrypt(m, k) = G(k) \oplus m
$$

where $G(\cdot)$ is a pseudo-random generator (PRG) and $\oplus$ = XOR.

What happens if an attacker knows $m$ and intercepts the message?

---

## Malleability

_**Answer:**_ they can compute $G(k)$!

Since we didn't do anything to *authenticate* the message, the attacker can now
encrypt their own message and forward it on to the receiver.

---

## Malleability

</figcaption>
</figure>

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=0>

</figcaption>
</figure>

</div>
  <div class="fragment" data-fragment-index=1>

</figcaption>
</figure>

</div>

===

## Message authentication codes

---

## What is a MAC?

A _**message authentication code (MAC)**_ is an algorithm that generates a few
bytes of data known as a *tag*. This tag can be used to verify the
_**authenticity**_ of a message.

</figcaption>
</figure>

---

## Poly1305

_**Poly1305**_ is a MAC algorithm that takes a 32-byte key and generates a
16-byte tag.

It's called Poly1305 because it evaluates a polynomial in the ring
$\mathbb{Z}/(2^{130} - 5)\mathbb{Z}$ (but you don't need to know that to use it
in practice):

$$
p(r) = (c_1r^q + c_2r^{q-1} + \ldots + c_qr) \mod{2^{130} - 5}
$$

---

## Hash-based MAC (HMAC)

_**Hash-based MAC (HMAC)**_ is a convenient method of turning a cryptographic
hash function like SHA-256 into a MAC.

</div>
<div class="fragment" data-fragment-index=0>

You can think of HMAC as a _**keyed** hash function_; it's a hash function
$h(x,k)$ that takes some data $x$ and a key $k$, and outputs a hash.

</div>

---

## Aside: SipHash

_**SipHash:**_ does not provide sufficient guarantees to be used as a
cryptographic hash function, but is good enough to be used by HMAC.

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

- Unlike e.g. SHA, its output isn't supposed to be uniformly random; _but_

</div>
<div class="fragment fade-in">

- it's difficult for an attacker to compute $h(x, k)$ without knowing $k$.

</div>

</figcaption>
</figure>

---

## SipHash: hash flooding

In exchange for its weaker security guarantees, SipHash is extremely efficient.
This makes it useful for preventing _**hash flooding attacks**_.

</div>

_**Example:**_ in Python, you can use a dictionary to perform key-value lookups.

```python
>>> headers = {}
>>> headers["Content-Type"] = "application/json"
>>> headers["Host"] = "www.example.org"
...
```

A `dict` is just a hash table. What are some security properties that you'd want
the hash function used by `dict` to have?

</div>

---

## SipHash: hash flooding

What if an adversary knows what hashes you're going to compute, and can get you
to insert keys that all hash to the same value?

</div>

*Source: Jean-Philippe Aumasson*

</figcaption>
</figure>

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=1>

*Source: Jean-Philippe Aumasson*

</figcaption>
</figure>

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=2>

*Source: Jean-Philippe Aumasson*

</figcaption>
</figure>

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=3>

*Source: Jean-Philippe Aumasson*

</figcaption>
</figure>

</div>
  <div class="fragment fade-in" data-fragment-index=4>

An attacker can use this to DoS a server by forcing it to perform increasingly
expensive key lookups.

As these lookups get more expensive, the server has less time to respond to
legitimate requests.

</div>
</div>

notes:

JPA's slides on defenses against hash flooding attacks:
https://www.aumasson.jp/siphash/siphashdos_appsec12_slides.pdf

---

## SipHash: hash flooding

_**Solution:**_ generate a random secret key, and then use SipHash! An attacker
who doesn't know the key can't compute the hash.

---

## Applications of SipHash

Python uses SipHash to hash strings and bytes for its dictionaries.

</figcaption>
</figure>

</div>
  <div class="fragment" data-fragment-index=0>

The Linux kernel uses SipHash internally for its hash tables

</figcaption>
</figure>

</div>
</div>

notes:

- PEP 456: https://peps.python.org/pep-0456/
- SipHash in the kernel docs: https://docs.kernel.org/next/security/siphash.html
- `siphash.h`: https://elixir.bootlin.com/linux/latest/source/include/linux/siphash.h

---

## Authenticated encryption

An _**authenticated encryption with additional data (AEAD)**_ algorithm is an
encryption scheme that takes three inputs:

- a *plaintext*;

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=1>

- a *secret key*; and

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=2>

- a *header* (optionally)

</div>

and produces two outputs: a *ciphertext* (containing the encryption of the
plaintext) and a *MAC*.

</div>

---

## Authenticated encryption

The decryption algorithm takes the *ciphertext*, *MAC*, and *header* and
produces the plaintext. It also checks whether the header or ciphertext have
been modified.

</div>
<div class="fragment" data-fragment-index=0>

The "associated data" is not encrypted, but we still check its integrity. This
is useful when we have message metadata that we want others to be able to see.

</div>

---

## ChaCha20-Poly1305

*ChaCha20-Poly1305* is an authenticated encryption scheme that uses the ChaCha20
stream cipher with the Poly1305 MAC.

Some notable uses:

*Source: age / Filippo Valsorda*

</figcaption>
</figure>

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=0>

*Source: Wireguard / Jason Donenfeld*

</figcaption>
</figure>

</div>
</div>

---

## ChaCha20-Poly1305

<pre>
<code class="python" data-trim data-line-numbers="1-10|3-4|5-6|7|8-10">
>>> import os
>>> from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305
>>> data = b"a secret message"
>>> aad = b"authenticated but unencrypted data"
>>> key = ChaCha20Poly1305.generate_key()
>>> chacha = ChaCha20Poly1305(key)
>>> nonce = os.urandom(12)
>>> ct = chacha.encrypt(nonce, data, aad)
>>> chacha.decrypt(nonce, ct, aad)
b'a secret message'
</code>
</pre>

---

## AES-GCM

**_AES-GCM_** (*Galois/Counter Mode*) combines CTR-mode AES with the GHASH hash
function to provide message authentication on top of confidentiality.

</figcaption>
</figure>

---

## AES-GCM-SIV

There is also a variant of AES, _**AES-GCM-SIV**_ (*SIV* = "Synthetic
Initialization Vector"), which provideds nonce misuse resistance in addition.

(I.e., you can reuse a nonce without it blowing up in your face)

</div>
  <div class="col">

</figcaption>
</figure>

</div>
</div>

notes:

Nonce misuse resistance: nothing gets revealed if a nonce gets reused for two
different messages. If a nonce is reused for the same message, you can find out
that the same message was encrypted, but no other information is revealed.

---

## Summary

*Never* use a cipher like AES or ChaCha by itself! Any messages you send are
malleable and cannot be authenticated.

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

*Message authentication codes (MACs)* are algorithms that generate a tag that
can be used to determine whether a message has been tampered. This tag is
usually computed on the ciphertext and appended.

</div>
<div class="fragment fade-in" data-fragment-index=1>

In practice, you should use an *authenticated encryption* / *AEAD* schemes like
AES-GCM and ChaCha20-Poly1305, which provide both confidentiality *and*
integrity.

</div>