CS 3710: Intro to Cybersecurity (slides): Memory Safety

# Memory Safety
## CS 3710

</div>
  <div class="col">

*Source: [Fish in a Barrel](https://fishinabarrel.github.io/)*

</figcaption>
</figure>

</div>
</div>

notes:

Additional references:
- Recent NSA guidance on memory safety (2022-11-10): https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF
- Shellphish How2Heap: https://github.com/shellphish/how2heap
- "Nightmare" binex tutorial: https://guyinatuxedo.github.io/index.html

===

## Final and course logistics

---

## Final -- logistics

The class final will be on _**Thursday, December 15th from 9AM - 12PM**_.

You can take it in-person (*location TBD*) or remotely, but keep in mind that
you _**will**_ need a stable internet connection for the exam.

---

## Final -- structure

The final will consist of four questions; you will select two of them to
complete in the three-hour period (7 points/each).

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

Each question will have an interactive component and a written component.

</div>
<div class="fragment fade-in" data-fragment-index=1>

The final is *non-comprehensive*; it will only *directly* cover topics from the
second (defense-focused) half of the course.

That said, you should review topics from the first half of the course, as you
may find them helpful to understand (especially for the written component).

</div>

---

## Final -- topics

Broadly speaking, the final will cover the following topics:

- _**Cryptography:**_ algorithms, AEADs, best practices

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

- _**Networking / firewalls:**_ constructing firewalls for various protocols
  running over TCP / UDP and analyzing network traffic.

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=1>

- _**Access control:**_ Linux permissions, DAC/MAC, and sandboxing

</div>
<div class="fragment fade-in" data-fragment-index=2>

- _**Detection:**_ incident response / breach recovery, YARA

</div>

---

## Course evaluations

Course evaluations open tomorrow (Tuesday 11/29).

_**Please**_ take a few minutes to fill them out! 🙂

</div>

===

## Common memory vulnerabilities

---

## The stack and the heap

The _**stack**_ stores variables that are *statically allocated*.

Whenever you call a function, it pushes a new "stack frame" reserving memory for
variables used by the function, as well as the return address.

</div>
  <div class="col">

</figcaption>
</figure>

</div>
</div>

---

## The stack and the heap

The _**heap**_ is used to store variables that are *dynamically allocated*.

*Source: glibc documentation (["malloc
internals"](https://sourceware.org/glibc/wiki/MallocInternals))*

</figcaption>
</figure>

---

## Memory bugs

Memory bugs typically show up in languages that encourage manual memory
management and explicitly working with memory addresses.

</div>
<div class="fragment fade-in" data-fragment-index=0>

While these bugs can (in theory) show up in almost any program, they most
frequently occur in C / C++ programs.

</div>

---

## Memory bugs: `strcpy` example

_**Example:**_ this is an implementation of C's `strcpy` function, which copies
the contents of one string (`char*`) into another.

</div>

```c
/* WARNING: this is *very* dangerous! You should use the strncpy function (or
 * similar) instead. */

char* strcpy(char* dest, const char* src) {
  /* The fancy C programmer's way of implementing strcpy. This loop just copies
   * bytes from src into dest until we hit a null (zero) byte. */
  for (; (*dest) = (*src); dest++, src++);
  return dest;
}
```

C code typically represents strings using the `char*` type. The end of the
string occurs when we encounter a null byte (i.e. `0x00`).

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=1>

`char*` is actually a *pointer* to a value of type `char`. It contains a
location in the program's address space.

</div>
</div>

---

## Memory bugs: `strcpy` example

What happens if the `dest` buffer is smaller than the `src` buffer?

</div>
<div class="fragment fade-in" data-fragment-index=0>

_**Answer:**_ C will happily run straight off the end of the `dest` buffer and
keep writing to adjacent locations in memory!

</div>

</figcaption>
</figure>

</div>
  <div class="fragment fade-in" data-fragment-index=1>

</figcaption>
</figure>

</div>
</div>

---

## Buffer overflow

_**Buffer overflow**_ occurs when a program writes past the end of a buffer and
into adjacent memory (like in the `strcpy` example).

</div>
  <div class="col">

*Source: [Phrack](http://phrack.org/issues/49/14.html)*

</figcaption>
</figure>

</div>
</div>

notes:

http://phrack.org/issues/49/14.html

Buffer overflows:
- Wikipedia: https://en.wikipedia.org/wiki/Buffer_overflow

---

## Buffer overread

A _**buffer overread**_ happens when we *read* past the end of a buffer.

This bug can leak the contents of variables in neighboring memory.

</div>
  <div class="col">

*Source: [Phrack](http://phrack.org/issues/49/14.html)*

</figcaption>
</figure>

</div>
</div>

---

## Heap-based bugs

When a C program needs to dynamically allocate a non-fixed amount of memory, it
typically uses the `malloc` function.

```c
void* buffer;

if (NULL == (buffer = malloc(length)))
  perror("Error: failed to allocate buffer");
```

</div>

When the buffer is no longer needed, it is `free`'d so that the program doesn't
run out of memory (i.e., cause a *memory leak*):

```c
free(buffer);
```

</div>

---

## Use-after-free

A _**use-after-free**_ bug occurs when we try to dereference a pointer to a
dynamically-allocated memory block after `free`'ing the pointer.

</figcaption>
</figure>

---

## Double-free

A _**double free**_ bug occurs when the same chunk is `free`'d twice (or more).

</figcaption>
</figure>

notes:

OWASP: https://owasp.org/www-community/vulnerabilities/Doubly_freeing_memory

===

## Exploiting memory vulnerabilities

---

## Stack smashing

Whenever a function gets called in C / C++, the program pushes a new *stack
frame* onto the stack.

The stack frame contains a *return address* specifying where the program should
jump to after it returns.

</div>
  <div class="col">

</figcaption>
</figure>

</div>
</div>

notes:

The standard text on this type of exploit: http://phrack.org/issues/49/14.html

---

## Stack smashing

The traditional exploit overwrites the return address to jump to some custom
assembly code (*shellcode*) that you've crafted.

</div>
  <div class="col">

</figcaption>
</figure>

</div>
</div>

---

## Stack smashing

When the function returns, instead of jumping back to the caller, it jumps to
the attacker's assembly code instead and executes it.

</figcaption>
</figure>

---

## Stack smashing - Demo (?)

```c
#include <stdio.h>

void read_input(void)
{
    char buffer[512] = {0};
    char* ptr = buffer;

for ( int c = 0; EOF != (c = getchar()); *(ptr++) = c );
    printf("Received input: %s\n", buffer);
}

int main(void)
{
    read_input();
    return 0;
}
```

```bash
$ clang -Wall -Werror -Wextra -O0 -g -z execstack -fno-stack-protector \
    stacksmash.c -o stacksmash.bin

$ sudo sysctl -w kernel.randomize_va_space=0
```

---

## Overwriting the return address

There are a few problems that make this kind of exploit much more difficult
nowadays:

- Memory on the stack is typically marked as non-executable (meaning you can't
  execute shellcode that's written to the stack)

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

- ASLR (*address space layout randomization*) makes it difficult to predict what
  memory address you need to jump to.

</div>
<div class="fragment fade-in" data-fragment-index=1>

- Compilers will make the program place *stack canaries* in memory and panic if
  the canary gets overwritten.

</div>

---

## Just how common are memory vulnerabilities?

_**Myth:**_ memory safety bugs are a ploy by Big Cyber to sell more cyber. The
only engineers who create memory bugs are bad C programmers.

</div>
<div class="fragment fade-in" data-fragment-index=0>

_**Fact:**_ memory bugs are prevalent even in the most well-funded and
heavily-audited C/C++ projects.

</div>

---

## Just how common are memory vulnerabilities?

*Source: [Google](https://chromereleases.googleblog.com/2022/11/stable-channel-update-for-desktop_24.html?m=1)*

</figcaption>
</figure>

notes:

FishInABarrel: https://fishinabarrel.github.io/
FishInABarrel (Twitter): https://twitter.com/lazyfishbarrel

---

## Just how common are memory vulnerabilities?

*Source: [Google](https://github.com/google/security-research/security/advisories/GHSA-pf87-6c9q-jvm4)*

</figcaption>
</figure>

===

## Protecting against memory vulnerabilities

---

## Using memory-safe languages

When possible, the easiest way to protect against memory vulnerabilities is to
use a memory-safe language, e.g. Python, Go, or Rust.

</div>

</figcaption>
</figure>

Each of these languages has protections in place to keep you from managing
allocation/deallocation manually, accessing invalid memory locations, etc.

</div>

---

## Compiler protections

C/C++ compilers provide various options to help protect against memory
vulnerabilities.

- By default, gcc / clang injects stack canaries into the program to help detect
  buffer overflows

</div>
<div class="fragment" data-fragment-index=0>

- `-D_FORTIFY_SOURCE=2`: provides additional protections to check for buffer
  overflows at runtime.

</div>

notes:

- `-D_FORTIFY_SOURCE` (blogpost from RedHat): https://www.redhat.com/en/blog/enhance-application-security-fortifysource
  - See also: [`man 7 feature_test_macros`](https://man7.org/linux/man-pages/man7/feature_test_macros.7.html)

---

## Best practices

You should _**always**_ bound the maximum amount of memory that you read from or
write to a buffer.

</div>
<div class="fragment fade-in" data-fragment-index=0>

_**Example:**_ `strcpy` vs `strncpy`. Both of these functions copy the bytes
from a `src` pointer to the buffer that `dest` points to.

/* From `man 3 strcpy`: */
#include &lt;string.h&gt;

char *strcpy(char *dest, const char *src);

char *strncpy(char *dest, const char *src, size_t n);

</code>
</pre>

`strcpy` (insecure): overwrite `dest` with the bytes in `src` until you hit a
null byte

</div>
  <div class="fragment fade-in" data-fragment-index=2>

`strncpy` (more secure): overwrite `dest` with *at most* `n` bytes from `src`,
until you hit a null byte

</div>
</div>
</div>

---

## Additional protection mechanisms

... and of course, all of the other defense mechanisms we've discussed up to
this point in the semester still apply. 🙂

</div>
<div class="fragment fade-in" data-fragment-index=0>

In particular, the sandboxing mechanisms we've discussed, such as

- Linux security modules
- seccomp
- namespaces

and so on come in handy here.

</div>

===

## Fuzzing

---

## What's a fuzzer, anyways?

We've seen *web fuzzers* in the first half of the semester -- `ffuf` is a
fuzzer, and you had to write `xfuzz` for PA1.

</div>
  <div class="fragment fade-in" data-fragment-index=0>

More generally, a _**fuzzer**_ is any program that feeds many different inputs
to another program, in the hopes of finding an "interesting" result.

</div>
</div>

*Source: [ffuf](https://github.com/ffuf/ffuf)*

</figcaption>
</figure>

---

## What's a fuzzer?

*"What qualifies as interesting?"*

That depends on the domain and what your goals are!

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

For a web fuzzer, it might be a page that returns a particular status code, a
URL parameter that generates a different output, etc.

</div>
<div class="fragment fade-in" data-fragment-index=1>

When we talk about *memory vulnerabilities*, we typically look for program
crashes, or particular memory accesses / patterns (e.g. using taint flow
analysis).

</div>

---

## How to fuzz

Many different types of fuzzing strategies exist.

**Brute force:** simply generate as many different inputs as possible

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=0>

**Coverage-guided:** we try to generate inputs that cause the program to explore
as many different execution paths as possible.

</figcaption>
</figure>

</div>
  <div class="fragment fade-in-then-out" data-fragment-index=1>

**Concolic methods:** use a mixture of *concrete* and *symbolic* techniques.

The fuzzer builds a symbolic model of the program and uses that to inform what
inputs it generates.

</div>
  <div class="col">

*Source: [Angr](https://angr.io/)*

</figcaption>
</figure>

</div>
</div>

---

## AFL

AFL ("american fuzzy lop"), and its successor AFL++, is a well-known fuzzer.

It tries to explore as many paths of the program as possible by mutating its
initial inputs until it can trigger a crash.

</div>
  <div class="col">

*Source: [AFL++](https://github.com/AFLplusplus/AFLplusplus)*

</figcaption>
</figure>

</div>
</div>

---

## afl demo

_**Background:**_ CVE-2014-9471 was a bug in `gnulib` (a library used for many
common Linux tools) in parsing date-time strings, e.g.

Test program:

// TZ=&#34;America/Los_Angeles&#34; &#34;00:00 + 1 hour&#34; should trigger a crash
#include &lt;stdio.h&gt;
#include &#34;parse-datetime.h&#34;

int main(void) {
    struct timespec result;
    char buf[1024] = { 0 };

if (NULL == fgets(buf, sizeof(buf), stdin)) {
        perror(&#34;Error reading from stdin\n&#34;);
        return 1;
    }

if (parse_datetime(&amp;result, buf, NULL)) {
        printf(&#34;Parsing successful\n&#34;);
    } else {
        printf(&#34;Parsing failed\n&#34;);
    }
}

</code>
</pre>

notes:

Fuzzing example code is in examples/fuzz/
- make
- mkdir -p output
- afl-fuzz -i ./input -o ./output -- ./vuln

---

## Real-world bugs caught with fuzzers

OSS-Fuzz provides _continuous fuzzing_ to hundreds of open-source projects,
running AFL++, libFuzzer, and Honggfuzz.

*Source: [OSS-Fuzz](https://github.com/google/oss-fuzz)*

</figcaption>
</figure>

[Bugs found by
OSS-Fuzz](https://bugs.chromium.org/p/oss-fuzz/issues/list?q=-status%3AWontFix%2CDuplicate%20-component%3AInfra&can=1)

</div>

notes:

So far, OSS-Fuzz has found ~45,000 bugs in hundreds of projects