<div class="container"> <div class="col"> # Memory Safety ## CS 3710 </div> <div class="col"> <figure> <img src="../../img/misc/fishinabarrel_memory_safety_poster.webp"> <figcaption> *Source: [Fish in a Barrel](https://fishinabarrel.github.io/)* </figcaption> </figure> </div> </div> notes: Additional references: - Recent NSA guidance on memory safety (2022-11-10): https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF - Shellphish How2Heap: https://github.com/shellphish/how2heap - "Nightmare" binex tutorial: https://guyinatuxedo.github.io/index.html === ## Final and course logistics --- ## Final -- logistics The class final will be on _**Thursday, December 15th from 9AM - 12PM**_. You can take it in-person (*location TBD*) or remotely, but keep in mind that you _**will**_ need a stable internet connection for the exam. --- ## Final -- structure <div class="fragment semi-fade-out" data-fragment-index=0> The final will consist of four questions; you will select two of them to complete in the three-hour period (7 points/each). </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> Each question will have an interactive component and a written component. </div> <div class="fragment fade-in" data-fragment-index=1> The final is *non-comprehensive*; it will only *directly* cover topics from the second (defense-focused) half of the course. That said, you should review topics from the first half of the course, as you may find them helpful to understand (especially for the written component). </div> --- ## Final -- topics Broadly speaking, the final will cover the following topics: <div class="fragment semi-fade-out" data-fragment-index=0> - _**Cryptography:**_ algorithms, AEADs, best practices </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> - _**Networking / firewalls:**_ constructing firewalls for various protocols running over TCP / UDP and analyzing network traffic. </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=1> - _**Access control:**_ Linux permissions, DAC/MAC, and sandboxing </div> <div class="fragment fade-in" data-fragment-index=2> - _**Detection:**_ incident response / breach recovery, YARA </div> --- ## Course evaluations <div class="text-center"> Course evaluations open tomorrow (Tuesday 11/29). _**Please**_ take a few minutes to fill them out! 🙂 </div> === ## Common memory vulnerabilities --- ## The stack and the heap <div class="container container-center"> <div class="col"> The _**stack**_ stores variables that are *statically allocated*. Whenever you call a function, it pushes a new "stack frame" reserving memory for variables used by the function, as well as the return address. </div> <div class="col"> <figure> <img src="../../img/misc/ProgramCallStack2_en.svg"class="image-background"style="max-height: 60vh;"> <figcaption> </figcaption> </figure> </div> </div> --- ## The stack and the heap The _**heap**_ is used to store variables that are *dynamically allocated*. <figure> <img src="../../img/misc/glibc_malloc.png"style="max-height: 40vh;"> <figcaption> *Source: glibc documentation (["malloc internals"](https://sourceware.org/glibc/wiki/MallocInternals))* </figcaption> </figure> --- ## Memory bugs <div class="fragment semi-fade-out" data-fragment-index=0> Memory bugs typically show up in languages that encourage manual memory management and explicitly working with memory addresses. </div> <div class="fragment fade-in" data-fragment-index=0> While these bugs can (in theory) show up in almost any program, they most frequently occur in C / C++ programs. </div> --- ## Memory bugs: `strcpy` example <div class="fragment semi-fade-out" data-fragment-index=0> _**Example:**_ this is an implementation of C's `strcpy` function, which copies the contents of one string (`char*`) into another. </div> ```c /* WARNING: this is *very* dangerous! You should use the strncpy function (or * similar) instead. */ char* strcpy(char* dest, const char* src) { /* The fancy C programmer's way of implementing strcpy. This loop just copies * bytes from src into dest until we hit a null (zero) byte. */ for (; (*dest) = (*src); dest++, src++); return dest; } ``` <div class="text-center r-stack"> <div class="fragment fade-in-then-out" data-fragment-index=0> C code typically represents strings using the `char*` type. The end of the string occurs when we encounter a null byte (i.e. `0x00`). </div> <div class="fragment fade-in-then-out" data-fragment-index=1> `char*` is actually a *pointer* to a value of type `char`. It contains a location in the program's address space. </div> </div> --- ## Memory bugs: `strcpy` example <div class="fragment semi-fade-out" data-fragment-index=0> What happens if the `dest` buffer is smaller than the `src` buffer? </div> <div class="fragment fade-in" data-fragment-index=0> _**Answer:**_ C will happily run straight off the end of the `dest` buffer and keep writing to adjacent locations in memory! </div> <div class="r-stack text-center"> <div class="fragment fade-in-then-out" data-fragment-index=0> <figure> <img src="../../img/misc/strcpy_overflow_1.png"style="max-height: 40vh;"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in" data-fragment-index=1> <figure> <img src="../../img/misc/strcpy_overflow_2.png"style="max-height: 40vh;"> <figcaption> </figcaption> </figure> </div> </div> --- ## Buffer overflow <div class="container container-center"> <div class="col"> _**Buffer overflow**_ occurs when a program writes past the end of a buffer and into adjacent memory (like in the `strcpy` example). </div> <div class="col"> <figure> <img src="../../img/misc/phrack-logo.jpg"> <figcaption> *Source: [Phrack](http://phrack.org/issues/49/14.html)* </figcaption> </figure> </div> </div> notes: http://phrack.org/issues/49/14.html Buffer overflows: - Wikipedia: https://en.wikipedia.org/wiki/Buffer_overflow --- ## Buffer overread <div class="container container-center"> <div class="col"> A _**buffer overread**_ happens when we *read* past the end of a buffer. This bug can leak the contents of variables in neighboring memory. </div> <div class="col"> <figure> <img src="../../img/misc/Heartbleed.svg"style="max-height: 40vh;"> <figcaption> *Source: [Phrack](http://phrack.org/issues/49/14.html)* </figcaption> </figure> </div> </div> --- ## Heap-based bugs <div class="fragment semi-fade-out" data-fragment-index=0> When a C program needs to dynamically allocate a non-fixed amount of memory, it typically uses the `malloc` function. ```c void* buffer; if (NULL == (buffer = malloc(length))) perror("Error: failed to allocate buffer"); ``` </div> <div class="fragment fade-in" data-fragment-index=0> When the buffer is no longer needed, it is `free`'d so that the program doesn't run out of memory (i.e., cause a *memory leak*): ```c free(buffer); ``` </div> --- ## Use-after-free A _**use-after-free**_ bug occurs when we try to dereference a pointer to a dynamically-allocated memory block after `free`'ing the pointer. <figure> <img src="../../img/misc/use_after_free.drawio.svg"class="image-background"style="max-height: 40vh; padding: 20px;"> <figcaption> </figcaption> </figure> --- ## Double-free A _**double free**_ bug occurs when the same chunk is `free`'d twice (or more). <figure> <img src="../../img/misc/double_free.drawio.svg"class="image-background"style="max-height: 40vh; padding: 20px;"> <figcaption> </figcaption> </figure> notes: OWASP: https://owasp.org/www-community/vulnerabilities/Doubly_freeing_memory === ## Exploiting memory vulnerabilities --- ## Stack smashing <div class="container container-center"> <div class="col"> Whenever a function gets called in C / C++, the program pushes a new *stack frame* onto the stack. The stack frame contains a *return address* specifying where the program should jump to after it returns. </div> <div class="col"> <figure> <img src="../../img/misc/stack_frame.drawio.svg"class="image-background"style="max-height: 50vh; padding: 20px;"> <figcaption> </figcaption> </figure> </div> </div> notes: The standard text on this type of exploit: http://phrack.org/issues/49/14.html --- ## Stack smashing <div class="container container-center"> <div class="col"> The traditional exploit overwrites the return address to jump to some custom assembly code (*shellcode*) that you've crafted. </div> <div class="col"> <figure> <img src="../../img/misc/stack_smashing_1.drawio.svg"class="image-background"style="max-height: 50vh; padding: 20px;"> <figcaption> </figcaption> </figure> </div> </div> --- ## Stack smashing When the function returns, instead of jumping back to the caller, it jumps to the attacker's assembly code instead and executes it. <figure> <img src="../../img/misc/stack_smashing_2.drawio.svg"class="image-background"style="max-height: 50vh; padding: 20px;"> <figcaption> </figcaption> </figure> --- ## Stack smashing - Demo (?) ```c #include <stdio.h> void read_input(void) { char buffer[512] = {0}; char* ptr = buffer; for ( int c = 0; EOF != (c = getchar()); *(ptr++) = c ); printf("Received input: %s\n", buffer); } int main(void) { read_input(); return 0; } ``` ```bash $ clang -Wall -Werror -Wextra -O0 -g -z execstack -fno-stack-protector \ stacksmash.c -o stacksmash.bin $ sudo sysctl -w kernel.randomize_va_space=0 ``` --- ## Overwriting the return address There are a few problems that make this kind of exploit much more difficult nowadays: <div class="fragment semi-fade-out" data-fragment-index=0> - Memory on the stack is typically marked as non-executable (meaning you can't execute shellcode that's written to the stack) </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> - ASLR (*address space layout randomization*) makes it difficult to predict what memory address you need to jump to. </div> <div class="fragment fade-in" data-fragment-index=1> - Compilers will make the program place *stack canaries* in memory and panic if the canary gets overwritten. </div> --- ## Just how common are memory vulnerabilities? <div class="fragment semi-fade-out" data-fragment-index=0> _**Myth:**_ memory safety bugs are a ploy by Big Cyber to sell more cyber. The only engineers who create memory bugs are bad C programmers. </div> <div class="fragment fade-in" data-fragment-index=0> _**Fact:**_ memory bugs are prevalent even in the most well-funded and heavily-audited C/C++ projects. </div> --- ## Just how common are memory vulnerabilities? <figure> <img src="../../img/misc/chrome_cve_2022_4135.png"> <figcaption> *Source: [Google](https://chromereleases.googleblog.com/2022/11/stable-channel-update-for-desktop_24.html?m=1)* </figcaption> </figure> notes: FishInABarrel: https://fishinabarrel.github.io/ FishInABarrel (Twitter): https://twitter.com/lazyfishbarrel --- ## Just how common are memory vulnerabilities? <figure> <img src="../../img/misc/linux_bluetooth_uaf.webp"style="max-height: 40vh;"> <figcaption> *Source: [Google](https://github.com/google/security-research/security/advisories/GHSA-pf87-6c9q-jvm4)* </figcaption> </figure> === ## Protecting against memory vulnerabilities --- ## Using memory-safe languages <div class="fragment semi-fade-out" data-fragment-index=0> When possible, the easiest way to protect against memory vulnerabilities is to use a memory-safe language, e.g. Python, Go, or Rust. </div> <figure> <img src="../../img/misc/memory_safe_langs.png"style="max-height: 30vh;"> <figcaption> </figcaption> </figure> <div class="fragment" data-fragment-index=0> Each of these languages has protections in place to keep you from managing allocation/deallocation manually, accessing invalid memory locations, etc. </div> --- ## Compiler protections C/C++ compilers provide various options to help protect against memory vulnerabilities. <div class="fragment semi-fade-out" data-fragment-index=0> - By default, gcc / clang injects stack canaries into the program to help detect buffer overflows </div> <div class="fragment" data-fragment-index=0> - `-D_FORTIFY_SOURCE=2`: provides additional protections to check for buffer overflows at runtime. </div> notes: - `-D_FORTIFY_SOURCE` (blogpost from RedHat): https://www.redhat.com/en/blog/enhance-application-security-fortifysource - See also: [`man 7 feature_test_macros`](https://man7.org/linux/man-pages/man7/feature_test_macros.7.html) --- ## Best practices <div class="fragment semi-fade-out" data-fragment-index=0> You should _**always**_ bound the maximum amount of memory that you read from or write to a buffer. </div> <div class="fragment fade-in" data-fragment-index=0> _**Example:**_ `strcpy` vs `strncpy`. Both of these functions copy the bytes from a `src` pointer to the buffer that `dest` points to. <pre class="code-wrapper"> <code class="c" data-trim data-line-numbers="|4|6" data-fragment-index=1> /* From `man 3 strcpy`: */ #include <string.h> char *strcpy(char *dest, const char *src); char *strncpy(char *dest, const char *src, size_t n); </code> </pre> <div class="r-stack text-center"> <div class="fragment fade-in-then-out" data-fragment-index=1> `strcpy` (insecure): overwrite `dest` with the bytes in `src` until you hit a null byte </div> <div class="fragment fade-in" data-fragment-index=2> `strncpy` (more secure): overwrite `dest` with *at most* `n` bytes from `src`, until you hit a null byte </div> </div> </div> --- ## Additional protection mechanisms <div class="fragment semi-fade-out" data-fragment-index=0> ... and of course, all of the other defense mechanisms we've discussed up to this point in the semester still apply. 🙂 </div> <div class="fragment fade-in" data-fragment-index=0> In particular, the sandboxing mechanisms we've discussed, such as - Linux security modules - seccomp - namespaces and so on come in handy here. </div> === ## Fuzzing --- ## What's a fuzzer, anyways? <div class="r-stack"> <div class="fragment fade-out" data-fragment-index=0> We've seen *web fuzzers* in the first half of the semester -- `ffuf` is a fuzzer, and you had to write `xfuzz` for PA1. </div> <div class="fragment fade-in" data-fragment-index=0> More generally, a _**fuzzer**_ is any program that feeds many different inputs to another program, in the hopes of finding an "interesting" result. </div> </div> <figure> <img src="../../img/web/ffuf_run_logo_600.webp"> <figcaption> *Source: [ffuf](https://github.com/ffuf/ffuf)* </figcaption> </figure> --- ## What's a fuzzer? <div class="fragment semi-fade-out" data-fragment-index=0> *"What qualifies as interesting?"* That depends on the domain and what your goals are! </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> For a web fuzzer, it might be a page that returns a particular status code, a URL parameter that generates a different output, etc. </div> <div class="fragment fade-in" data-fragment-index=1> When we talk about *memory vulnerabilities*, we typically look for program crashes, or particular memory accesses / patterns (e.g. using taint flow analysis). </div> --- ## How to fuzz Many different types of fuzzing strategies exist. <div class="r-stack"> <div class="fragment fade-out" data-fragment-index=0> **Brute force:** simply generate as many different inputs as possible </div> <div class="fragment fade-in-then-out" data-fragment-index=0> **Coverage-guided:** we try to generate inputs that cause the program to explore as many different execution paths as possible. <figure> <img src="../../img/misc/coverage_fuzzing.drawio.svg"class="image-background"style="padding: 20px;"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in-then-out" data-fragment-index=1> <div class="container container-center"> <div class="col"> **Concolic methods:** use a mixture of *concrete* and *symbolic* techniques. The fuzzer builds a symbolic model of the program and uses that to inform what inputs it generates. </div> <div class="col"> <figure> <img src="../../img/misc/angry_face.png"> <figcaption> *Source: [Angr](https://angr.io/)* </figcaption> </figure> </div> </div> </div> </div> --- ## AFL <div class="container container-center"> <div class="col"> AFL ("american fuzzy lop"), and its successor AFL++, is a well-known fuzzer. It tries to explore as many paths of the program as possible by mutating its initial inputs until it can trigger a crash. </div> <div class="col"> <figure> <img src="../../img/misc/aflpp_bg.svg"> <figcaption> *Source: [AFL++](https://github.com/AFLplusplus/AFLplusplus)* </figcaption> </figure> </div> </div> --- ## afl demo _**Background:**_ CVE-2014-9471 was a bug in `gnulib` (a library used for many common Linux tools) in parsing date-time strings, e.g. Test program: <pre class="code-wrapper"> <code class="c" data-trim> // TZ="America/Los_Angeles" "00:00 + 1 hour" should trigger a crash #include <stdio.h> #include "parse-datetime.h" int main(void) { struct timespec result; char buf[1024] = { 0 }; if (NULL == fgets(buf, sizeof(buf), stdin)) { perror("Error reading from stdin\n"); return 1; } if (parse_datetime(&result, buf, NULL)) { printf("Parsing successful\n"); } else { printf("Parsing failed\n"); } } </code> </pre> notes: Fuzzing example code is in examples/fuzz/ - make - mkdir -p output - afl-fuzz -i ./input -o ./output -- ./vuln --- ## Real-world bugs caught with fuzzers OSS-Fuzz provides _continuous fuzzing_ to hundreds of open-source projects, running AFL++, libFuzzer, and Honggfuzz. <figure> <img src="../../img/misc/oss-fuzz.png"style="max-height: 30vh;"> <figcaption> *Source: [OSS-Fuzz](https://github.com/google/oss-fuzz)* </figcaption> </figure> <div class="text-center"> [Bugs found by OSS-Fuzz](https://bugs.chromium.org/p/oss-fuzz/issues/list?q=-status%3AWontFix%2CDuplicate%20-component%3AInfra&can=1) </div> notes: So far, OSS-Fuzz has found ~45,000 bugs in hundreds of projects