CS 3710: Intro to Cybersecurity (slides): Malware techniques

# Malware techniques
## CS 3710: Intro to Cybersecurity

===

## Persistence

---

## What happens after an attacker gets initial access?

An offensive operation doesn't end after an attacker gets initial access.

They have to _**maintain**_ access and pivot into more valuable systems.

</div>
<div class="fragment fade-in" data-fragment-index="1">

This is made more difficult by the fact that real-world networks are
_**segmented**_.

The most exposed systems are separated from more valuable hosts, and are more
closely monitored by defenders.

</div>
</div>
</div>
<div class="col">
<div class="text-center image-background">
<img style="max-height: 40vh;" src="../../img/web/dmz_example.svg">
</div>
</div>

notes:

The attacker's job is made more difficult by the fact that real-world networks
are usually segmented.

The most exposed systems, like web and email servers, are the ones an attacker
is most likely to get initial access through. These systems are usually
firewalled off from more valuable parts of the network, and are more heavily
monitored by defenders.

---

## Persistence

In order to maintain access to a target's systems, the attacker has to achieve
_**persistence:**_ a way to install a reliable backdoor on the target's systems.

This entails finding a mechanism to ensure that the backdoor remains _available_
and _undetected_ by the victim.

===

## Linux: userspace persistence mechanisms

notes:

References:
- [Understanding Linux Malware](https://ieeexplore.ieee.org/document/8418602)

---

## systemd services and cron jobs

systemd is a common init system and service manager. On most Linux
distributions, it's the first process that runs after boot.

cron is a job scheduler that allows you to periodically re-run a command (e.g.
every five minutes).

</div>

</div>
  <div class="col">

</figcaption>
</figure>

</div>
</div>

notes:

[MITRE ATT\&CK T1053.003](https://attack.mitre.org/techniques/T1053/003/)

[MITRE ATT\&CK T1543.002](https://attack.mitre.org/techniques/T1543/002/)

---

## systemd services and cron jobs

Example cron job:

```text
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name command to be executed
23 4 * * *      /bin/script.sh
```

This cron job says "run the script `/bin/script.sh` every day at 4:23AM".

</div>

---

## systemd services and cron jobs

systemd and cron allow you to run a programs in the background, and their
configuration persists after reboot. They are also both very common on Linux
systems.

One popular (albeit simple) persistence technique is to define a systemd service
or cron job to keep malware running in the background.

*Examples:* [VPNFilter](https://en.wikipedia.org/wiki/VPNFilter), [Rocke cryptominer](https://www.anomali.com/blog/rocke-evolves-its-arsenal-with-a-new-malware-family-written-in-golang)

</div>

notes:

Understanding Linux Malware (Cozzi et al, 2018) found 70 malware samples that
modified `/etc/crontab` to gain persistence on Linux-based IoT devices. It only
found two that installed systemd services, but hundreds of samples created
SysVinit scripts (to which systemd has maintained backwards compatibility).

---

## systemd services and cron jobs

Defending against these kinds of mechanisms is generally pretty simple; a
defender just needs to monitor the installation of new systemd services/cron
jobs and compare them against the baseline.

</div>
<div class="fragment" data-fragment-index="0">

Systemd:
- `/etc/systemd/system/`
- `/usr/lib/systemd/system`
- `$HOME/.config/systemd/user`

Cron:
- `/etc/crontab`, `/etc/cron.*`
- `/var/spool/cron/crontabs/`

</div>

---

## Dynamic linker hijacking with `LD_PRELOAD`

Many Linux programs use _**dynamic linking**_ to load code from shared libraries
at runtime.

Linux uses the **E**xecutable and **L**inkable **F**ormat (ELF) as the file
format for executables. ELF binaries include a segment with the libraries they
should be dynamically linked to during runtime.

</figcaption>
</figure>

notes:

ELF = "Executable and Linkable Format"

[Anatomy of Linux dynamic
libraries](https://developer.ibm.com/tutorials/l-dynamic-libraries/)

---

## Dynamic linker hijacking with `LD_PRELOAD`

For example, consider the following "hello, world" program in C:

```c
/* test.c */

#include <stdio.h>

int main(void) {
    puts("hello, world!");
    return 0;
}
```

When this program is compiled, it is dynamically linked to the C standard
library so that it knows where to find the function `puts` at runtime:

</div>

<pre class="code-wrapper">
<code class="text" data-trim data-noescape data-line-numbers="1-5|2-3|4-5">
$ gcc test.c -o test
$ readelf -d test | grep "Shared library"
 0x0000000000000001 (NEEDED)         Shared library: [libc.so.6]
$ readelf --dyn-syms test | grep puts
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (2)
</code>
</pre>

</div>

notes:

[MITRE ATT\&CK T1574.006](https://attack.mitre.org/techniques/T1574/006/)

---

## Dynamic linker hijacking with `LD_PRELOAD`

When a binary is executed, the kernel loads dynamically linked libraries into
memory (if needed), and performs the relocations needed to e.g. identify `puts`
with the function defined in `libc.so.6`.

<pre class="code-wrapper">
<code class="text" data-trim data-noescape data-line-numbers="1-9|1-2|3,7">
$ ./test
hello, world!
$ strace -e trace=%file ./test
execve("./test", ["./test"], 0x7fff10af9db0 /* 26 vars */) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
hello, world!
+++ exited with 0 +++
</code>
</pre>

`libc.so.6` is loaded from `/lib/x86_64-linux-gnu/libc.so.6` on my system.

</div>

**Q:** can we get the dynamic linker to load malicious code?

---

## Dynamic linker hijacking with `LD_PRELOAD`

`LD_PRELOAD` is an environmental variable that tells the dynamic linker to check
a different directory for shared libraries before checking the default paths.

```text
$ LD_PRELOAD=/path/to/my/library.so ./test
```

notes:

The default directories searched by the dynamic linker are usually specified in
`/etc/ld.so.conf`.

---

## Dynamic linker hijacking with `LD_PRELOAD`

*Example:*

/* evil.c */

#include &lt;dlfcn.h&gt;

int puts(const char* s) {
    void* handle;
    int (*real_puts)(const char* s);

// `dlopen` and `dlsym` load the real `puts` function from
    // the C standard library
    handle = dlopen(&#34;libc.so.6&#34;, RTLD_LAZY);
    real_puts = dlsym(handle, &#34;puts&#34;);

return real_puts(&#34;hello from evil.so &gt;:)&#34;);
}

</code>
</pre>

`dlsym` loads the memory address of the original version of `puts` and stores it
in the `real_puts` variable

</div>
  <div class="fragment fade-in-then-out" data-fragment-index="1">

The "evil" version of `puts` calls the original `puts` with the message `"hello
from evil.so >:)"`

</div>
</div>

notes:

The code snippet above doesn't include proper error handling for `dlopen` and
`dlsym`, and should not be directly copied and pated.

---

## Dynamic linker hijacking with `LD_PRELOAD`

Program output with `LD_PRELOAD`:

<pre class="code-wrapper">
<code class="text" data-trim data-noescape
  data-fragment-index="0"
  data-line-numbers="1-5|1|2-3|4-5">
$ gcc  -fPIC -Wl,--no-as-needed -ldl -shared -o evil.so evil.c
$ ./test
hello, world!
$ LD_PRELOAD=./evil.so ./test
hello from evil.so >:)
</code>
</pre>

This command compiles `evil.c` as a shared library

`-Wl,--no-as-needed -ldl` allow it to use `dlopen` and `dlsym`

</div>
  <div class="fragment fade-in-then-out" data-fragment-index="1">

The original `test` program runs as expected...

</div>
  <div class="fragment fade-in-then-out" data-fragment-index="2">

... but with `LD_PRELOAD`, we inject the malicious version of `puts` defined in
`evil.so`

</div>
</div>

---

## Dynamic linker hijacking with `LD_PRELOAD`

You can do more sophisticated `LD_PRELOAD` tricks to hook common C functions and
inject malicious code into them.

</div>

*Examples:*
[Symbiote](https://www.intezer.com/blog/research/new-linux-threat-symbiote/) is
a recently-discovered malware strain that uses `LD_PRELOAD` to inject itself
into processes and evade detection.

<figure>
  <img src="../../img/malware/symbiote-evasion-techniques.webp"style="max-height: 30vh;">
  <figcaption>
  
*Source: Intezer / Blackberry*

</figcaption>
</figure>

notes:

- Used to target Latin America's financial industry (it's suggested that it may
  have been designed to be used against some major Brazilian banks)
- Symbiote is compiled as a shared library
- Some of its functionalities include hiding the ports that it's using for
  communications, hiding which processes are running, hiding files, etc.

[Blackberry's
analysis](https://blogs.blackberry.com/en/2022/06/symbiote-a-new-nearly-impossible-to-detect-linux-threat)

It's worth noting that some of the claims about Symbiote's stealthiness are
probably a little overblown. Grugq had a good criticism of Symbiote's
methodology here: https://grugq.substack.com/p/userland-rootkits-are-lame

---

## Defenses

_**Userspace**_ rootkits (those that don't try to hook into the Linux kernel
itself) are relatively easy to detect and defend against.

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index="0">

`LD_PRELOAD`: there are a ton of different ways to determine whether a library
has been injected into a process, and it's unlikely that an attacker will be
able to adequately cover all of their bases.

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index="1">

- Use tools that are statically compiled (where `LD_PRELOAD` would have no
  effect)

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index="2">

- Checking the `LD_PRELOAD` variable in `/proc/$pid/environ`, or with
  `/usr/bin/env`

</div>
<div class="fragment" data-fragment-index="3">

- Inspect process memory and memory mappings
- etc.

</div>
</div>

===

## Linux: kernelspace persistence mechanisms

---

## Kernelspace vs userspace

Programs are divided into _**kernelspace**_ and _**userspace**_. Programs that
run in kernelspace have higher privileges (and hence, more opportunities to hide
themselves from detection).

</figcaption>
</figure>

</div>

---

## Kernel modules

_**Loadable Kernel Modules**_ (**LKMs**) are a means of adding functionality to
the Linux kernel. Modules can be loaded or unloaded from the kernel any time
after boot.

LKMs are often used to implement device drivers for new hardware, but they have
many other applications.

---

## Syscall table hooking

_**System calls**_ (or _**syscalls**_) are an important part of the interface
that the kernel exposes to the userspace. They're used to access the filesystem,
manipulate processes, and more.

_**Examples:**_ `open`, `read`, `execve`, `fork`, etc.

</div>

*Source: [LWN](https://lwn.net/Articles/604287/)*

</figcaption>
</figure>

</div>
</div>

notes:

- Syscall hooking rootkit blogpost from TrailOfBits:
  [blog](https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/)

---

## Syscall table hooking

When your computer wants to make a syscall, it uses the `syscall` assembly code
instruction and passes in the number of the syscall that it wants to perform.

</div>

```asm
section .data
    msg: db "hello, world!", 10
    msgLen: equ $-msg

section .text
    global _start

_start:
    ; Write message to stdout
    mov rdi, 1
    nop
    mov rsi, msg
    mov rdx, msgLen
    mov rax, 1
    syscall
    ; Exit with error code 0
    mov rdi, 0
    mov rax, 60
    syscall
```

---

## Syscall table hooking

The kernel keeps a lookup table in memory that maps different syscall numbers to
the function that they should perform.

When the `syscall` instruction is run, the kernel looks up the entry in the
table corresponding to the given syscall.

</div>
<div class="fragment" data-fragment-index="0">

**Idea:** overwrite the entry in the table for a given syscall so that we can
intercept it

</div>

---

## Syscall table hooking

</figcaption>
</figure>

</div>
<div class="image-background fragment" data-fragment-index="0">

</figcaption>
</figure>

</div>

notes:

Further info:
- [RITSEC computer
  club](https://ritcsec.wordpress.com/2020/11/22/linux-syscall-hooking/)

---

## Syscall table hooking

Attacker workflow:

- Load a new kernel module using `insmod`

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index="0">

- Kernel module identifies the memory address of the syscall table
  - *Older kernel versions:* use the `kallsyms_lookup_name` function or read
    `/proc/kallsyms`
  - *Newer kernel versions:* use kprobes

</div>
<div class="fragment" data-fragment-index="1">

- Overwrite syscall addresses so that they point to the attacker's malicious
  code

</div>

===

## Command and control

---

## Command and control

After an attacker has gotten into a network and installed malware, that malware
usually needs a way to communicate back to its operator.

</figcaption>
</figure>

---

## Command and control

Those communications are usually used for two (interconnected) purposes:

- Exfiltrating data collected from the network back to the attacker, and

</div>
<div class="fragment" data-fragment-index=1>

- Obtaining new instructions and payloads to execute.

</div>
</div>

We refer to these communications as _**command and control**_ (often shortened
to _**CNC**_ or _**C2**_).

</div>

---

## Case study: Drovorub

_**Drovorub**_ is a malware toolkit for Linux.

It was publicly identified in August 2020 by the NSA and FBI and attributed to
Russia's GRU.

notes:

References:

- [NSA / FBI
report](https://media.defense.gov/2020/Aug/13/2002476465/-1/-1/0/CSA_DROVORUB_RUSSIAN_GRU_MALWARE_AUG_2020.PDF)

---

## Components of Drovorub

Drovorub isn't just a single piece of malware, but a collection of tools that
can be used to infect and control Linux machines.

</div>
<div class="fragment fade-in-then-out" data-fragment-index=0>

**Drovorub-client:** the implant that gets installed on the infected machine. It
receives and executes commands from the attacker

</div>
<div class="fragment fade-in-then-out" data-fragment-index=1>

**Drovorub-kernel:** a kernel module rootkit that hooks various Linux kernel
functions to hide itself

</div>
<div class="fragment fade-in-then-out" data-fragment-index=2>

**Drovorub-server:** a server that enables sending + receiving commands and
stores Drovorub client/agent data in a MySQL database

</div>
<div class="fragment" data-fragment-index=3>

**Drovorub-agent:** relays traffic from Drovorub-server to Drovorub-client
installations

</div>
</div>

*Source: FBI / NSA*

</figcaption>
</figure>

---

## Drovorub C2 modules

Drovorub uses a modular design that allows it to support various capabilities:

- Authentication to the C2 server

</div>

- Transferring files to Drovorub clients

</div>

- Hiding files and network artifacts (e.g. open ports) from the view of Linux's
  userspace

</div>

- Executing shell scripts on the client

</div>

---

## Hiding C2 traffic

A critical issue for malware operators is figuring out how to hide C2 traffic
from detection by defenders.

To avoid detection, they have to make C2 traffic look identical to "legitimate"
traffic.

</div>
<div class="fragment" data-fragment-index=0>

For example, if their malware is communicating with

`http://www.mymalwaredomain.evil`

</div>

a defender will (hopefully!) see that traffic going through their network and
realize that it's coming from malware.

</div>

---

## Case study: Emotet

_**Emotet**_ is a malware strain and botnet used variously used various for
banking trojans, ransomware operations, crimeware, and more.

*Source: Malwarebytes*

</figcaption>
</figure>

notes:

References:

- [Emotet](https://en.wikipedia.org/wiki/Emotet)
- [MalwareTech blogpost on Emotet
C2](https://www.malwaretech.com/2017/11/investigating-command-and-control-infrastructure-emotet.html)

---

## Case study: Emotet

The Emotet binary contains a hardcoded list of IP addresses it can use for C2.
Normally an attacker would point these IP addresses towards servers that they
control...

*Source:
[MalwareTech](https://www.malwaretech.com/2017/11/investigating-command-and-control-infrastructure-emotet.html)*

</figcaption>
</figure>

---

## Case study: Emotet

... but on first glance it looks like these IP addresses belong to legitimate
web hosting providers (who wouldn't tolerate an ransomware gang hosting a C2
server on their infrastructure).

*Source:
[MalwareTech](https://www.malwaretech.com/2017/11/investigating-command-and-control-infrastructure-emotet.html)*

</figcaption>
</figure>

---

## Case study: Emotet

The most simple way to do C2 is to just rent a server that is running the C2
infrastructure and have the malware communicate with it directly.

</figcaption>
</figure>

---

## Case study: Emotet

However, this kind of C2 can lead to difficulties in practice. If the web
hosting provider isn't abuse-friendly, they can take down the server and tie it
back to the attackers.

</div>
<div class="fragment" data-fragment-index=0>

If they are abuse-friendly, their IP address range will often get blocked by
defenders, and the hosting provider will probably be a target for law
enforcement.

</div>

---

## Case study: Emotet

To get around these problems, Emotet proxied C2 traffic through servers they had
hacked:

</figcaption>
</figure>

---

## Case study: Emotet

Attacker workflow:

- Find some vulnerable servers on the internet and gain access to them (usually
  by just performing mass scans and searching for known vulnerabilities)

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

- Set up a proxy on the compromised machines that will forward traffic from the
  compromised machine to the attackers' servers.

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=1>

- Before deploying malware to targets, embed the IP addresses of the compromised
  hosts in the malware.

</div>
<div class="fragment" data-fragment-index=2>

- Malware communicates back to the attacker through the legitimate-looking (but
  compromised) hosts.

</div>

---

## Alternative communication channels

There are many other channels that attackers can try to use to cover up
command-and-control communications.

As long as it looks like "normal" web traffic, it's a potential C2 vector!

---

## C2 channels: DNS exfiltration

**DNS exfil:** malware encodes the message that it wants to send back to the
attacker and then sends it back in DNS queries

</div>
<div class="fragment" data-fragment-index=0>

These queries are for legitimate domains, but the queried subdomain contains
the attacker's message.

</div>
</div>
<div class="col">

</figcaption>
</figure>

</div>
</div>

notes:

Palo Alto report on APT 18 (Wekby) using DNS as an exfil channel:
https://unit42.paloaltonetworks.com/unit42-new-wekby-attacks-use-dns-requests-as-command-and-control-mechanism/

MITRE ATT<CK T1071: https://attack.mitre.org/techniques/T1071/004/

---

## C2 channels: DNS exfiltration

*Source: Unit 42 / Palo Alto Networks*

</figcaption>
</figure>

---

## C2 channels: APT 41

**APT 41:** created malware (called POISONPLUG) that used two different methods
to communicate back with the malware operators:

</div>
<div class="fragment fade-in-then-semi-out" data-fragment-index=0>

- A Google document, which it would read and parse to obtain instructions.

</div>
<div class="fragment" data-fragment-index=1>

- A Steam (game distribution platform) community page that it used as a
  fallback mechanism

</div>
  </div>
  <div class="col">

*Source: FireEye*

</figcaption>
</figure>

</div>
</div>

notes:

FireEye report: https://content.fireeye.com/apt-41/rpt-apt41