# Malware techniques ## CS 3710: Intro to Cybersecurity === ## Persistence --- ## What happens after an attacker gets initial access? <div class="container"> <div class="col" style="padding-right: 1em;"> <div class="overlap"> <div class="fragment fade-out" data-fragment-index="1"> An offensive operation doesn't end after an attacker gets initial access. They have to _**maintain**_ access and pivot into more valuable systems. </div> <div class="fragment fade-in" data-fragment-index="1"> This is made more difficult by the fact that real-world networks are _**segmented**_. The most exposed systems are separated from more valuable hosts, and are more closely monitored by defenders. </div> </div> </div> <div class="col"> <div class="text-center image-background"> <img style="max-height: 40vh;" src="../../img/web/dmz_example.svg"> </div> </div> notes: The attacker's job is made more difficult by the fact that real-world networks are usually segmented. The most exposed systems, like web and email servers, are the ones an attacker is most likely to get initial access through. These systems are usually firewalled off from more valuable parts of the network, and are more heavily monitored by defenders. --- ## Persistence In order to maintain access to a target's systems, the attacker has to achieve _**persistence:**_ a way to install a reliable backdoor on the target's systems. This entails finding a mechanism to ensure that the backdoor remains _available_ and _undetected_ by the victim. === ## Linux: userspace persistence mechanisms notes: References: - [Understanding Linux Malware](https://ieeexplore.ieee.org/document/8418602) --- ## systemd services and cron jobs <div class="container"> <div class="col"> systemd is a common init system and service manager. On most Linux distributions, it's the first process that runs after boot. <div class="fragment"> cron is a job scheduler that allows you to periodically re-run a command (e.g. every five minutes). </div> </div> <div class="col"> <figure> <img src="../../img/misc/systemd-light.svg"> <figcaption> </figcaption> </figure> </div> </div> notes: [MITRE ATT\&CK T1053.003](https://attack.mitre.org/techniques/T1053/003/) [MITRE ATT\&CK T1543.002](https://attack.mitre.org/techniques/T1543/002/) --- ## systemd services and cron jobs Example cron job: ```text # .---------------- minute (0 - 59) # | .------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | # * * * * * user-name command to be executed 23 4 * * * /bin/script.sh ``` <div class="code-inline-bg"> This cron job says "run the script `/bin/script.sh` every day at 4:23AM". </div> --- ## systemd services and cron jobs systemd and cron allow you to run a programs in the background, and their configuration persists after reboot. They are also both very common on Linux systems. <div class="fragment"> One popular (albeit simple) persistence technique is to define a systemd service or cron job to keep malware running in the background. *Examples:* [VPNFilter](https://en.wikipedia.org/wiki/VPNFilter), [Rocke cryptominer](https://www.anomali.com/blog/rocke-evolves-its-arsenal-with-a-new-malware-family-written-in-golang) </div> notes: Understanding Linux Malware (Cozzi et al, 2018) found 70 malware samples that modified `/etc/crontab` to gain persistence on Linux-based IoT devices. It only found two that installed systemd services, but hundreds of samples created SysVinit scripts (to which systemd has maintained backwards compatibility). --- ## systemd services and cron jobs <div class="fragment semi-fade-out" data-fragment-index="0"> Defending against these kinds of mechanisms is generally pretty simple; a defender just needs to monitor the installation of new systemd services/cron jobs and compare them against the baseline. </div> <div class="fragment" data-fragment-index="0"> Systemd: - `/etc/systemd/system/` - `/usr/lib/systemd/system` - `$HOME/.config/systemd/user` Cron: - `/etc/crontab`, `/etc/cron.*` - `/var/spool/cron/crontabs/` </div> --- ## Dynamic linker hijacking with `LD_PRELOAD` Many Linux programs use _**dynamic linking**_ to load code from shared libraries at runtime. Linux uses the **E**xecutable and **L**inkable **F**ormat (ELF) as the file format for executables. ELF binaries include a segment with the libraries they should be dynamically linked to during runtime. <figure> <img src="../../img/malware/gnome_child.webp"style="max-height: 40vh;"> <figcaption> </figcaption> </figure> notes: ELF = "Executable and Linkable Format" [Anatomy of Linux dynamic libraries](https://developer.ibm.com/tutorials/l-dynamic-libraries/) --- ## Dynamic linker hijacking with `LD_PRELOAD` For example, consider the following "hello, world" program in C: ```c /* test.c */ #include <stdio.h> int main(void) { puts("hello, world!"); return 0; } ``` <div class="fragment"> <div class="code-inline-bg"> When this program is compiled, it is dynamically linked to the C standard library so that it knows where to find the function `puts` at runtime: </div> <pre class="code-wrapper"> <code class="text" data-trim data-noescape data-line-numbers="1-5|2-3|4-5"> $ gcc test.c -o test $ readelf -d test | grep "Shared library" 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] $ readelf --dyn-syms test | grep puts 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2) </code> </pre> </div> notes: [MITRE ATT\&CK T1574.006](https://attack.mitre.org/techniques/T1574/006/) --- ## Dynamic linker hijacking with `LD_PRELOAD` When a binary is executed, the kernel loads dynamically linked libraries into memory (if needed), and performs the relocations needed to e.g. identify `puts` with the function defined in `libc.so.6`. <pre class="code-wrapper"> <code class="text" data-trim data-noescape data-line-numbers="1-9|1-2|3,7"> $ ./test hello, world! $ strace -e trace=%file ./test execve("./test", ["./test"], 0x7fff10af9db0 /* 26 vars */) = 0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 hello, world! +++ exited with 0 +++ </code> </pre> <div class="code-inline-bg"> `libc.so.6` is loaded from `/lib/x86_64-linux-gnu/libc.so.6` on my system. </div> **Q:** can we get the dynamic linker to load malicious code? --- ## Dynamic linker hijacking with `LD_PRELOAD` `LD_PRELOAD` is an environmental variable that tells the dynamic linker to check a different directory for shared libraries before checking the default paths. ```text $ LD_PRELOAD=/path/to/my/library.so ./test ``` notes: The default directories searched by the dynamic linker are usually specified in `/etc/ld.so.conf`. --- ## Dynamic linker hijacking with `LD_PRELOAD` *Example:* <pre class="code-wrapper"> <code class="c" data-trim data-line-numbers="1-17|5,9-12|5,14" data-fragment-index="0"> /* evil.c */ #include <dlfcn.h> int puts(const char* s) { void* handle; int (*real_puts)(const char* s); // `dlopen` and `dlsym` load the real `puts` function from // the C standard library handle = dlopen("libc.so.6", RTLD_LAZY); real_puts = dlsym(handle, "puts"); return real_puts("hello from evil.so >:)"); } </code> </pre> <div class="r-stack text-center code-inline-bg"> <div class="fragment fade-in-then-out" data-fragment-index="0"> `dlsym` loads the memory address of the original version of `puts` and stores it in the `real_puts` variable </div> <div class="fragment fade-in-then-out" data-fragment-index="1"> The "evil" version of `puts` calls the original `puts` with the message `"hello from evil.so >:)"` </div> </div> notes: The code snippet above doesn't include proper error handling for `dlopen` and `dlsym`, and should not be directly copied and pated. --- ## Dynamic linker hijacking with `LD_PRELOAD` Program output with `LD_PRELOAD`: <pre class="code-wrapper"> <code class="text" data-trim data-noescape data-fragment-index="0" data-line-numbers="1-5|1|2-3|4-5"> $ gcc -fPIC -Wl,--no-as-needed -ldl -shared -o evil.so evil.c $ ./test hello, world! $ LD_PRELOAD=./evil.so ./test hello from evil.so >:) </code> </pre> <div class="r-stack code-inline-bg text-center"> <div class="fragment fade-in-then-out" data-fragment-index="0"> This command compiles `evil.c` as a shared library `-Wl,--no-as-needed -ldl` allow it to use `dlopen` and `dlsym` </div> <div class="fragment fade-in-then-out" data-fragment-index="1"> The original `test` program runs as expected... </div> <div class="fragment fade-in-then-out" data-fragment-index="2"> ... but with `LD_PRELOAD`, we inject the malicious version of `puts` defined in `evil.so` </div> </div> --- ## Dynamic linker hijacking with `LD_PRELOAD` <div class="code-inline-bg"> You can do more sophisticated `LD_PRELOAD` tricks to hook common C functions and inject malicious code into them. </div> *Examples:* [Symbiote](https://www.intezer.com/blog/research/new-linux-threat-symbiote/) is a recently-discovered malware strain that uses `LD_PRELOAD` to inject itself into processes and evade detection. <figure> <img src="../../img/malware/symbiote-evasion-techniques.webp"style="max-height: 30vh;"> <figcaption> *Source: Intezer / Blackberry* </figcaption> </figure> notes: - Used to target Latin America's financial industry (it's suggested that it may have been designed to be used against some major Brazilian banks) - Symbiote is compiled as a shared library - Some of its functionalities include hiding the ports that it's using for communications, hiding which processes are running, hiding files, etc. [Blackberry's analysis](https://blogs.blackberry.com/en/2022/06/symbiote-a-new-nearly-impossible-to-detect-linux-threat) It's worth noting that some of the claims about Symbiote's stealthiness are probably a little overblown. Grugq had a good criticism of Symbiote's methodology here: https://grugq.substack.com/p/userland-rootkits-are-lame --- ## Defenses <div class="fragment semi-fade-out" data-fragment-index="0"> _**Userspace**_ rootkits (those that don't try to hook into the Linux kernel itself) are relatively easy to detect and defend against. </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="0"> `LD_PRELOAD`: there are a ton of different ways to determine whether a library has been injected into a process, and it's unlikely that an attacker will be able to adequately cover all of their bases. </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="1"> - Use tools that are statically compiled (where `LD_PRELOAD` would have no effect) </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="2"> - Checking the `LD_PRELOAD` variable in `/proc/$pid/environ`, or with `/usr/bin/env` </div> <div class="fragment" data-fragment-index="3"> - Inspect process memory and memory mappings - etc. </div> </div> === ## Linux: kernelspace persistence mechanisms --- ## Kernelspace vs userspace Programs are divided into _**kernelspace**_ and _**userspace**_. Programs that run in kernelspace have higher privileges (and hence, more opportunities to hide themselves from detection). <div class="image-background"> <figure> <img src="../../img/malware/Priv_rings.svg"style="max-height: 30vh;"> <figcaption> </figcaption> </figure> </div> --- ## Kernel modules _**Loadable Kernel Modules**_ (**LKMs**) are a means of adding functionality to the Linux kernel. Modules can be loaded or unloaded from the kernel any time after boot. LKMs are often used to implement device drivers for new hardware, but they have many other applications. --- ## Syscall table hooking <div class="container"> <div class="col"> _**System calls**_ (or _**syscalls**_) are an important part of the interface that the kernel exposes to the userspace. They're used to access the filesystem, manipulate processes, and more. _**Examples:**_ `open`, `read`, `execve`, `fork`, etc. </div> <div class="col"> <figure> <img src="../../img/malware/syscall_x86_64.png"style="max-height: 40vh;"> <figcaption> *Source: [LWN](https://lwn.net/Articles/604287/)* </figcaption> </figure> </div> </div> notes: - Syscall hooking rootkit blogpost from TrailOfBits: [blog](https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/) --- ## Syscall table hooking <div class="code-inline-bg"> When your computer wants to make a syscall, it uses the `syscall` assembly code instruction and passes in the number of the syscall that it wants to perform. </div> ```asm section .data msg: db "hello, world!", 10 msgLen: equ $-msg section .text global _start _start: ; Write message to stdout mov rdi, 1 nop mov rsi, msg mov rdx, msgLen mov rax, 1 syscall ; Exit with error code 0 mov rdi, 0 mov rax, 60 syscall ``` --- ## Syscall table hooking <div class="fragment semi-fade-out code-inline-bg" data-fragment-index="0"> The kernel keeps a lookup table in memory that maps different syscall numbers to the function that they should perform. When the `syscall` instruction is run, the kernel looks up the entry in the table corresponding to the given syscall. </div> <div class="fragment" data-fragment-index="0"> **Idea:** overwrite the entry in the table for a given syscall so that we can intercept it </div> --- ## Syscall table hooking <div class="image-background fragment semi-fade-out" data-fragment-index="0"> <figure> <img src="../../img/malware/syscall_table.drawio.png"> <figcaption> </figcaption> </figure> </div> <div class="image-background fragment" data-fragment-index="0"> <figure> <img src="../../img/malware/syscall_table_hooking.drawio.png"> <figcaption> </figcaption> </figure> </div> notes: Further info: - [RITSEC computer club](https://ritcsec.wordpress.com/2020/11/22/linux-syscall-hooking/) --- ## Syscall table hooking Attacker workflow: <div class="fragment semi-fade-out" data-fragment-index="0"> - Load a new kernel module using `insmod` </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="0"> - Kernel module identifies the memory address of the syscall table - *Older kernel versions:* use the `kallsyms_lookup_name` function or read `/proc/kallsyms` - *Newer kernel versions:* use kprobes </div> <div class="fragment" data-fragment-index="1"> - Overwrite syscall addresses so that they point to the attacker's malicious code </div> === ## Command and control --- ## Command and control After an attacker has gotten into a network and installed malware, that malware usually needs a way to communicate back to its operator. <figure> <img src="../../img/malware/virus_hacker_c2.webp"class="image-background"style="padding: 20px; max-height: 40vh;"> <figcaption> </figcaption> </figure> --- ## Command and control <div class="fragment semi-fade-out" data-fragment-index=2> Those communications are usually used for two (interconnected) purposes: <div class="fragment" data-fragment-index=0> - Exfiltrating data collected from the network back to the attacker, and </div> <div class="fragment" data-fragment-index=1> - Obtaining new instructions and payloads to execute. </div> </div> <div class="fragment" data-fragment-index=2> We refer to these communications as _**command and control**_ (often shortened to _**CNC**_ or _**C2**_). </div> --- ## Case study: Drovorub _**Drovorub**_ is a malware toolkit for Linux. It was publicly identified in August 2020 by the NSA and FBI and attributed to Russia's GRU. notes: References: - [NSA / FBI report](https://media.defense.gov/2020/Aug/13/2002476465/-1/-1/0/CSA_DROVORUB_RUSSIAN_GRU_MALWARE_AUG_2020.PDF) --- ## Components of Drovorub <div class="r-stack"> <div class="fragment fade-out" data-fragment-index=0> Drovorub isn't just a single piece of malware, but a collection of tools that can be used to infect and control Linux machines. </div> <div class="fragment fade-in-then-out" data-fragment-index=0> **Drovorub-client:** the implant that gets installed on the infected machine. It receives and executes commands from the attacker </div> <div class="fragment fade-in-then-out" data-fragment-index=1> **Drovorub-kernel:** a kernel module rootkit that hooks various Linux kernel functions to hide itself </div> <div class="fragment fade-in-then-out" data-fragment-index=2> **Drovorub-server:** a server that enables sending + receiving commands and stores Drovorub client/agent data in a MySQL database </div> <div class="fragment" data-fragment-index=3> **Drovorub-agent:** relays traffic from Drovorub-server to Drovorub-client installations </div> </div> <figure> <img src="../../img/malware/drovorub_components.webp"style="max-height: 35vh;"> <figcaption> *Source: FBI / NSA* </figcaption> </figure> --- ## Drovorub C2 modules Drovorub uses a modular design that allows it to support various capabilities: <div class="fragment semi-fade-out" data-fragment-index="1"> - Authentication to the C2 server </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="1"> - Transferring files to Drovorub clients </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="2"> - Hiding files and network artifacts (e.g. open ports) from the view of Linux's userspace </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="3"> - Executing shell scripts on the client </div> --- ## Hiding C2 traffic <div class="fragment semi-fade-out" data-fragment-index=0> A critical issue for malware operators is figuring out how to hide C2 traffic from detection by defenders. To avoid detection, they have to make C2 traffic look identical to "legitimate" traffic. </div> <div class="fragment" data-fragment-index=0> For example, if their malware is communicating with <div class="text-center"> `http://www.mymalwaredomain.evil` </div> a defender will (hopefully!) see that traffic going through their network and realize that it's coming from malware. </div> --- ## Case study: Emotet _**Emotet**_ is a malware strain and botnet used variously used various for banking trojans, ransomware operations, crimeware, and more. <figure> <img src="../../img/malware/malwarebytes_emotet.webp"style="max-height: 40vh;"> <figcaption> *Source: Malwarebytes* </figcaption> </figure> notes: References: - [Emotet](https://en.wikipedia.org/wiki/Emotet) - [MalwareTech blogpost on Emotet C2](https://www.malwaretech.com/2017/11/investigating-command-and-control-infrastructure-emotet.html) --- ## Case study: Emotet The Emotet binary contains a hardcoded list of IP addresses it can use for C2. Normally an attacker would point these IP addresses towards servers that they control... <figure> <img src="../../img/malware/emotet_c2_list-1.png"> <figcaption> *Source: [MalwareTech](https://www.malwaretech.com/2017/11/investigating-command-and-control-infrastructure-emotet.html)* </figcaption> </figure> --- ## Case study: Emotet ... but on first glance it looks like these IP addresses belong to legitimate web hosting providers (who wouldn't tolerate an ransomware gang hosting a C2 server on their infrastructure). <figure> <img src="../../img/malware/emotet_c2_1.webp"style="max-height: 30vh;"> <figcaption> *Source: [MalwareTech](https://www.malwaretech.com/2017/11/investigating-command-and-control-infrastructure-emotet.html)* </figcaption> </figure> --- ## Case study: Emotet The most simple way to do C2 is to just rent a server that is running the C2 infrastructure and have the malware communicate with it directly. <figure> <img src="../../img/malware/c2_direct.drawio.webp"class="image-background"style="max-height: 30vh; padding: 20px;"> <figcaption> </figcaption> </figure> --- ## Case study: Emotet <div class="fragment semi-fade-out" data-fragment-index=0> However, this kind of C2 can lead to difficulties in practice. If the web hosting provider isn't abuse-friendly, they can take down the server and tie it back to the attackers. </div> <div class="fragment" data-fragment-index=0> If they are abuse-friendly, their IP address range will often get blocked by defenders, and the hosting provider will probably be a target for law enforcement. </div> --- ## Case study: Emotet To get around these problems, Emotet proxied C2 traffic through servers they had hacked: <figure> <img src="../../img/malware/c2_proxied.drawio.webp"class="image-background"style="max-height: 30vh; padding: 20px;"> <figcaption> </figcaption> </figure> --- ## Case study: Emotet Attacker workflow: <div class="fragment semi-fade-out" data-fragment-index=0> - Find some vulnerable servers on the internet and gain access to them (usually by just performing mass scans and searching for known vulnerabilities) </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> - Set up a proxy on the compromised machines that will forward traffic from the compromised machine to the attackers' servers. </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=1> - Before deploying malware to targets, embed the IP addresses of the compromised hosts in the malware. </div> <div class="fragment" data-fragment-index=2> - Malware communicates back to the attacker through the legitimate-looking (but compromised) hosts. </div> --- ## Alternative communication channels There are many other channels that attackers can try to use to cover up command-and-control communications. As long as it looks like "normal" web traffic, it's a potential C2 vector! --- ## C2 channels: DNS exfiltration <div class="container"> <div class="col"> <div class="fragment semi-fade-out" data-fragment-index=0> **DNS exfil:** malware encodes the message that it wants to send back to the attacker and then sends it back in DNS queries </div> <div class="fragment" data-fragment-index=0> These queries are for legitimate domains, but the queried subdomain contains the attacker's message. </div> </div> <div class="col"> <figure> <img src="../../img/malware/dns_exfil.drawio.webp"class="image-background"style="padding: 10px;"> <figcaption> </figcaption> </figure> </div> </div> notes: Palo Alto report on APT 18 (Wekby) using DNS as an exfil channel: https://unit42.paloaltonetworks.com/unit42-new-wekby-attacks-use-dns-requests-as-command-and-control-mechanism/ MITRE ATT<CK T1071: https://attack.mitre.org/techniques/T1071/004/ --- ## C2 channels: DNS exfiltration <figure> <img src="../../img/malware/wekby_dns_exfil.webp"style="max-height: 50vh;"> <figcaption> *Source: Unit 42 / Palo Alto Networks* </figcaption> </figure> --- ## C2 channels: APT 41 <div class="container"> <div class="col"> <div class="fragment semi-fade-out" data-fragment-index=0> **APT 41:** created malware (called POISONPLUG) that used two different methods to communicate back with the malware operators: </div> <div class="fragment fade-in-then-semi-out" data-fragment-index=0> - A Google document, which it would read and parse to obtain instructions. </div> <div class="fragment" data-fragment-index=1> - A Steam (game distribution platform) community page that it used as a fallback mechanism </div> </div> <div class="col"> <figure> <img src="../../img/malware/apt_41.webp"> <figcaption> *Source: FireEye* </figcaption> </figure> </div> </div> notes: FireEye report: https://content.fireeye.com/apt-41/rpt-apt41