# Cryptography ## CS 3710: Intro to Cybersecurity === ## What is cryptography? --- ## Cryptography <div class="container"> <div class="col"> _**Cryptography**_ is the study of ways in which to protect the *confidentiality* and *integrity* of data. </div> <div class="col"> <div class="text-center image-background"> <img alt="The CIA triad, with confidentiality and integrity circled and the word 'encryption' pointing to both" style="max-height: 50vh;" src="../../img/encryption/cia_encryption.drawio.webp"> </div> </div> === ## Hashing --- ## Terminology _**Hash function:**_ a hash function is a function that takes in some arbitrary, variable-length data and returns a fixed-length string of bytes. <div class="fragment"> _**Hash collision:**_ when two different inputs $x_1$ and $x_2$ generate the same hash, i.e. $H(x_1) = H(x_2)$. </div> --- ## Terminology _**One-way function:**_ a _**one-way**_ (or _**trapdoor**_) _**function**_ is a function that is very difficult to reverse: - Given $x$, it's relatively easy to compute $f(x)$, but - Given $f(x)$, it's very difficult to compute $x$ --- ## Motivation \#1: password storage Storing users' passwords directly in a database can lead to a massive data breach if the database is compromised -- the attacker will have all of the users' passwords! --- ## Motivation \#2: file integrity Suppose that you've just downloaded `super_safe_program.exe` off the internet. How do you know that you got the correct file? --- ## Cryptographic hash functions A _**cryptographic hash function**_ is a one-way, collision-resistant hash function. <div class="image-background"> <img style="max-height: 12em;" src="../../img/encryption/Cryptographic_Hash_Function.svg"> </div> --- ## Cryptographic hash functions Properties: - _**One-way:**_ given a hash $h$ it should be difficult to find $x$ for which $H(x) = h$. - _**Collision-resistant:**_ it should be difficult to find $x_1$ and $x_2$ such that $H(x_1) = H(x_2)$. --- ## Motivation \#1: password storage <div class="fragment semi-fade-out" data-fragment-index="0"> Storing users' passwords directly in a database can lead to a massive data breach if the database is compromised -- the attacker will have all of the users' passwords! </div> <div class="fragment fade-in" data-fragment-index="0"> **Solution:** store the hash of the password instead! Since the hash function acts as a one-way function, it's very difficult to reverse to figure out what the original password was. </div> --- ## Motivation \#2: file integrity <div class="fragment semi-fade-out" data-fragment-index="0"> Suppose that you've just downloaded `super_safe_program.exe` off the internet. How do you know that you got the correct file? </div> <div class="fragment fade-in" data-fragment-index="0"> **Solution:** the publisher of the program publishes the hash of `super_safe_program.exe`. You check whether the hash of the file you downloaded matches the one provided by the publisher. </div> --- ## Hashing passwords for storage --- ## Rainbow tables _**Idea:**_ we calculate a large database of every password and its corresponding hash. <table> <thead> <tr> <td>Password</td> <td>Hash</td> </tr> </thead> <tbody> <tr> <td>123456</td> <td>e10adc3949ba59abbe56e057f20f883e</td> </tr> <tr> <td>password123</td> <td>482c811da5d5b4bc6d497ffa98491e38</td> </tr> <tr> <td>hunter2</td> <td>2ab96390c7dbe3439de74d0c9b0b1767</td> </tr> <tr> <td>trustno1</td> <td>5fcfd41e547a12215b173ff47fdd3739</td> </tr> <tr> <td>...</td> <td>...</td> </tr> </tbody> </table> notes: Hashes computed with MD5 in this table. --- ## Rainbow tables Now, whenever we get a new dump of breached passwords, we just check the dump against the hashes we've already calculated! --- ## Password salting <div class="code-inline-bg"> <div class="fragment semi-fade-out" data-fragment-index=0> To make it difficult for an attacker to use a rainbow table against you, you should also _**salt**_ your passwords. </div> <div class="fragment fade-in" data-fragment-index=0> A _**salt**_ is a random string of bytes that you append to a password. Instead of computing the password hash as `hash(password)`, you compute it as `hash(password + salt)`. You then store the salt in the database next to the password: </div> </div> --- ## Password salting <table> <thead> <tr> <td>Salt</td> <td>Hash</td> </tr> </thead> <tbody> <tr> <td>aed8b2efe9e88f43</td> <td>449bf67180c10de927c13bfca7417303</td> </tr> <tr> <td>b1ad70a09de9d8b0</td> <td>9c13a4f304163c024de5f32ca76d4a50</td> </tr> <tr> <td>95e58696b811f239</td> <td>449bf67180c10de927c13bfca7417303</td> </tr> <tr> <td>...</td> <td>...</td> </tr> </tbody> </table> --- ## Why salt? Password salting makes it extremely difficult to build a rainbow table -- instead of computing one hash per possible password, an attacker has to store $2^{\text{salt bit length}}$ hashes per password. - Using a 16-bit salt, an attacker has to compute $65,536$ hashes for every possible password. --- ## Example: the shadow file Here are some example entries in the `/etc/shadow` file: ``` alice:$1$jWxKcYgS$JABulTl8l1hxztU02MKDm.:19275:0:99999:7::: ``` <div class="r-stack"> <div class="fragment fade-in-then-out" data-fragment-index=0> <figure> <img src="../../img/encryption/shadow_example_algo.png"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in-then-out" data-fragment-index=1> <figure> <img src="../../img/encryption/shadow_example_salt.png"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in-then-out" data-fragment-index=2> <figure> <img src="../../img/encryption/shadow_example_hash.png"> <figcaption> </figcaption> </figure> </div> </div> --- ## MD5 <div class="container"> <div class="col"> _**Short summary:**_ bad, do not use MD5 is an older hashing algorithm that is still used today in some applications (especially in legacy code), but it is fairly broken and insecure for most purposes. Nowadays it's possible to compute an MD5 collision in less than one second. </div> <div class="col"> <div class="image-background text-center"> <img alt="Logo of US Cyber Command" src="../../img/encryption/US_CyberComm.webp"> </div> </div> notes: The US Cyber Command logo has the MD5 hash of its mission statement in the inner ring of its official emblem. --- ## SHA-2, SHA-3 <div class="container"> <div class="col"> _**Short summary:**_ more secure than MD5, but still not great for password storage. The SHA-2 and SHA-3 families are secure (it's difficult to find collisions) but they are fairly cheap to compute, which doesn't make them good for password hashing. </div> <div class="col"> <div class="image-background"> <img src="../../img/encryption/NIST_approved.webp"> </div> </div> </div> notes: SHA-2 and SHA-3 are approved as federal cryptographic standards in the US by NIST (the National Institute for Standards and Technology). --- ## SHA-2 and SHA-3 If you *are* going to use SHA-2 or SHA-3, the "correct" way is to do many iterations of the hash function. <div class="container"> <div class="fragment col text-center" style="padding: 0px;"> ```python from hashlib import sha256 hash = sha256(password + salt).digest() ``` bad </div> <div class="fragment col text-center" style="padding: 0px;"> ```python from hashlib import sha256 hash = sha256(password + salt).digest() for _ in range(1_000_000): hash = sha256(hash).digest() ``` better </div> </div> <div class="fragment"> _**PBKDF2**_ is a hashing algorithm that applies a hash function like SHA2-256 or SHA2-512 many times to generate a password hash. </div> --- ## scrypt <div class="container"> <div class="col"> _**Short summary:**_ a much better password hashing algorithm. Originally created as a _**key derivation algorithm**_ for the Tarsnap backup service, it is a strong and widely-implemented method for password hashing today. </div> <div class="col text-center" style="display: flex;"> <div style="margin: auto;"> <div class="image-background"> <img src="../../img/encryption/tarsnap.png"> </div> <div class="text-small"> *Source: Tarsnap / Colin Percival* </div> </div> </div> </div> notes: - The block size parameter is tuned according to memory performance; 8 is a normal default for this parameter. - Parallelism parameter affects how many CPU cores should be used to compute scrypt. --- ## scrypt scrypt takes multiple parameters that impact how difficult it is to compute: - A cost factor - A blocksize parameter - A parallelization factor --- ## Some password hashing math Let's say you have a hash of a random, 20-character password using a mixture of numbers and lowercase letters. How long would it take to crack the password? --- ## Some password hashing math Suppose you had a million computers trying different passwords around-the-clock to figure out the password. <div class="fragment semi-fade-out" data-fragment-index=0> - There are $36^{10} \approx 3.66 \times 10^{15}$ possible passwords </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> - On average, you will have to guess half that amount ($1.83 \times 10^{15}$) before you get the right password. </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=1> - If the hash takes 1ms to compute it will take you $\approx 42$ days to crack the password </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=2> - If the hash takes 1s to compute, it will take $\approx 115$ years to crack </div> --- ## File integrity Another common use for hash functions is to check that a file was downloaded correctly. <figure> <img src="../../img/encryption/debian_iso_verification.webp"> <figcaption> *Source: Debian* </figcaption> </figure> === ## Symmetric encryption --- ## Some terminology _**Plaintext** vs **ciphertext**:_ a _**plaintext**_ is a message that is unencrypted; a _**ciphertext**_ is an encrypted message. <div class="fragment"> _**Stream cipher** vs **block cipher**:_ a _**stream cipher**_ is a symmetric cipher that encrypts one byte of data at a time; a _**block cipher**_ encrypts the data in blocks (e.g. blocks of 16 bytes). </div> --- ## What is symmetric encryption? <figure> <img src="../../img/encryption/symmetric_encryption_1.png"style="max-height: 30vh;"> <figcaption> </figcaption> </figure> --- ## What is symmetric encryption? <div class="fragment"> If Alice and Bob can share a *small* secret over a secure channel, they can use _**symmetric encryption**_ to secure a large amount of data. </div> <figure> <img src="../../img/encryption/symmetric_encryption_2.webp"style="max-height: 30vh;"> <figcaption> </figcaption> </figure> --- ## Why is cryptography difficult? Case study: *the Caesar cipher* The Caesar cipher is an old and simple *substitution cipher*. To encrypt a message, you replace each letter with a different letter a fixed number of places down the alphabet. <figure> <img src="../../img/encryption/caesar_cipher.svg"class="image-background"style="max-height: 40vh;"> <figcaption> </figcaption> </figure> notes: In the given image, each letter is shifted down three places. --- ## Why is cryptography difficult? The Caesar cipher is bad and Julius Caesar should feel bad - It's incredibly easy to break even by hand -- you just need to try each of the 26 possible shifts. - What if we used a randomized substitution cipher? Then you would need to try $ 26! = 26 \cdot 25 \cdot \ldots \cdot 2 \cdot 1 $ different shifts, right? $$ \text{\`hello, world'} \mapsto \text{\`fdoog, jgpov'} $$ <div class="image-background"> <img src="../../img/encryption/random_substitution_cipher.drawio.svg"> </div> --- ## Why is cryptography difficult? That doesn't work either. Different letters occur with different frequencies in English - `e`: appears 13% of the time - `q`: appears 0.1% of the time With a large enough ciphertext, you can figure out what the letters were in the original plaintext by observing the frequencies of characters in ciphertexts. --- ## Why is cryptography difficult? So what? - Julius Caesar was a bad cryptographer, but also - **Cryptography is difficult.** History is littered with cryptographic schemes that have been broken in all kinds of ways. In general, *don't roll your own cryptography*, or at least make sure to consult your local cryptographer first. --- ## One-time pads A *one-time pad* is an encryption scheme that can't be cracked (although it isn't very convenient either). - Alice and Bob meet up and share a codebook with an identical sequence of random bytes, which acts as their encryption key. --- ## One-time pads When Alice wants to encrypt a message $m$ for Bob, she reads off the first $|m|$ (= length of $m$) bytes from the codebook, which we will call $K$. She then encrypts the message by XOR'ing it with $K$ ($\oplus = $ XOR) <div class="fragment fade-in"> $$ Encrypt(m, K) = K \oplus m $$ </div> <div class="text-center"> <table> <thead> <tr> <th>$a$</th> <th>$b$</th> <th>$a \oplus b$</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>0</td> <td>0</td> </tr> <tr> <td>0</td> <td>1</td> <td>1</td> </tr> <tr> <td>1</td> <td>0</td> <td>1</td> </tr> <tr> <td>1</td> <td>1</td> <td>0</td> </tr> </tbody> </table> </div> --- ## One-time pads When Bob goes to decrypt the ciphertext $C$, he reads the same bytes off of the one-time pad and XORs them with the ciphertext: $$ Decrypt(C, K) = C \oplus K = (m \oplus K) \oplus K = m $$ --- ## Problems with one-time pads <div class="container"> <div class="col"> - You can only use a one-time pad once. - The pad has to be at least as long as the message you're encrypting! - This makes one-time pads useless in most situations. - *Exception:* one-time pad usage in espionage. </div> <div class="col"> <div class="text-center"> <img src="../../img/encryption/nsa_one_time_pad.webp" style="max-height: 50vh;"> </div> </div> --- ## Problems with one-time pads What happens if you re-use a one-time pad? If messages $m_1$ and $m_2$ are encrypted to ciphertexts $C_1$ and $C_2$ using the same OTP $K$, then $$ C_1 \oplus C_2 = (m_1 \oplus K) \oplus (m_2 \oplus K) = m_1 \oplus m_2 $$ i.e., an attacker who intercepts $C_1$ and $C_2$ can learn $m_1 \oplus m_2$. --- ## One-time pad reuse <table class="table-noborder"> <tbody> <tr> <td> Use an OTP: </td> <td class="vertical-center"> <img alt="Initial image" src="../../img/encryption/otp-1.png"> $\oplus$ <img alt="Random one-time pad" src="../../img/encryption/otp.png"> $=$ <img alt="XOR is a random-looking image" style="border: 2px solid blue;" src="../../img/encryption/otp-1e.png"> </td> </tr> <tr class="fragment visible" data-fragment-index="0"> <td>Re-use the<br>same OTP:</td> <td class="vertical-center"> <img alt="Second image" src="../../img/encryption/otp-2.png"> $\oplus$ <img alt="Same one-time pad that was used previously" src="../../img/encryption/otp.png"> $=$ <img alt="XOR is another random-looking image" style="border: 2px solid orange;" src="../../img/encryption/otp-2e.png"> </td> </tr> <tr class="fragment visible current-fragment" data-fragment-index="1"> <td> Extract<br>the images: </td> <td class="vertical-center"> <img alt="The encryption of the first image" style="border: 2px solid blue;" src="../../img/encryption/otp-1e.png"> $\oplus$ <img alt="The encryption of the second image" style="border: 2px solid orange;" src="../../img/encryption/otp-2e.png"> $=$ <img alt="The XOR of the encrypted images. You can clearly see both of the images combined in the final result." src="../../img/encryption/otp-ans.png"> </td> </tr> </tbody> </table> <div class="text-small text-center"> *Source: [Aaron Bloomfield](https://aaronbloomfield.github.io/ics/)* </div> --- ## Pseudo-random generators How can we build on the idea of the one-time pad? - What if we could randomly generate a one-time pad using a short key? - What if we could re-use the same key many times? To achieve these goals, we will use a _**pseudo-random generator (PRG)**_. notes: Ch. 3.1 of Boneh and Shoup's "A Graduate Course in Applied Cryptography" serves as a good intro to PRGs. --- ## PRGs You can think of a PRG as a *cryptographically secure* random number generator. A PRG is just a fancy way of turning a few random bits into a big stream of random-looking bits. <div class="text-center image-background"> <img src="../../img/encryption/keystream.webp"> </div> --- ## Using PRGs to build ciphers <div class="fragment semi-fade-out" data-fragment-index=0> Using a PRG, we can build a stream cipher that acts like a one-time pad: <div class="text-center"> $$ Encrypt(m) = G(k) \oplus m $$ </div> The PRG $G(\cdot)$ generates a one-time pad which we can then XOR with the message. </div> <div class="fragment" data-fragment-index=0> Unlike a one-time pad, however, _**Alice and Bob only need to share a very short secret value (128-256 bits) to encrypt a massive amount of data!**_ E.g., ChaCha20 has a 256-bit key and can produce (in theory) $\approx 10^{40}$ random bytes before its internal PRG returns to the beginning. </div> --- ## Modern ciphers: ChaCha20 <div class="container"> <div class="col"> _**ChaCha20**_ is a stream cipher developed in 2008, modified from the Salsa20 stream cipher. It uses a 256-bit key, and is based on repeated *add-rotate-XOR (ARX)* operations. </div> <div class="col text-center"> <div class="image-background"> <img src="../../img/encryption/ChaCha Cipher Quarter Round Function.svg" style="max-height: 40vh;"> </div> <div class="text-small"> *ChaCha's quarter-round function* </div> </div> </div> --- ## Modern ciphers: AES <div class="container"> <div class="col"> The _**Advanced Encryption Standard (AES)**_ is a popular block cipher that has been the standard for symmetric encryption for decades. It can be run in various *"modes of operation"* that change the way you encrypt data. </div> <div class="col text-center"> <div class="image-background"> <img src="../../img/encryption/AES_Round_Function.png" alt="Visualization of the AES round function." style="max-height: 30vh; padding: 0px;"> </div> <div class="text-small"> *The AES round function* </div> </div> </div> --- ## Modern ciphers: AES When you run AES in *counter mode* (AES-CTR) or *output feedback mode* (AES-OFB), it acts like a PRG\*! <div class="text-small"> \*Okay, technically AES is what's known as a *pseudo-random permutation (PRP)*, but for our purposes they're effectively the same thing. </div> <div class="image-background"> <img src="../../img/encryption/aes_ctr_encryption.svg" style="padding: 0px;"> </div> notes: [Modes of operation on Wikipedia](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation) --- ## Cryptographic nonces <div class="r-stack"> <div class="fragment fade-out" data-fragment-index="1"> AES-CTR and ChaCha20 (and most modern symmetric ciphers) take both a *key* and a _**nonce**_; the nonce functions like a message ID. </div> <div class="fragment fade-in" data-fragment-index="1"> Each nonce generates a different keystream and should be used to encrypt no more than one message. *Reusing a key-nonce pair twice is equivalent to reusing a one-time pad!* </div> </div> <div class="text-center image-background"> <img src="../../img/encryption/keystream_with_nonce.webp" style="max-height: 30vh;"> </div> --- ## Attacks on nonce reuse <div class="container"> <div class="col"> One real-world case where nonce reuse was exploited was to extract the encryption keys required to run software on the Sony Playstation 3. This made it possible to run arbitrary software on the PS3. </div> <div class="col text-center"> <img src="../../img/encryption/c3_sony.webp" style="max-height: 50vh;"> <div class="text-small"> *Source: Fail0verflow / 27th Chaos Communication Congress* </div> </div> </div> notes: 27 C3 talk: https://archive.org/details/console-hacking-2010 --- ## Summary <div class="fragment semi-fade-out" data-fragment-index="1"> - We saw how _**one-time pads**_ could be used for encryption, and some of their disadvantages. - OTPs are too long, and can't be reused </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="1"> - We started using _**pseudo-random generators (PRGs)**_ for encryption, by taking a short key and generating a long stream of random bytes that can be used as an OTP. </div> <div class="fragment" data-fragment-index="2"> - Finally, we discussed _**ChaCha20**_ and _**AES-CTR**_ as encryption schemes that use a PRG under the hood. - These schemes take a key and a _**nonce**_, and generate a unique OTP for each key-nonce pair. - You can't reuse the same (key, nonce) pair twice! </div>