NOTE: This is an optional extra credit assignment! It is worth 5 points of extra credit towards your grade.
YARA will appear as one of the topics you can pick from on the final, so you may be interested in looking over this assignment nonetheless.
Lab structure
In this lab you will be using YARA to
write some rules to detect malware. In particular, you will write some rules to
detect the linux/x64/meterpreter_reverse_http
payload for Metasploit that
you’ve used in some of the earlier labs.
Before you get started, you should check out the appendix on YARA rules for a crash course on using YARA. Throughout this lab, you can use the following page as your primary reference for how to write YARA rules:
https://yara.readthedocs.io/en/stable/writingrules.html
You may also find some of the other references in the appendices useful for figuring out what your rules should look like.
What to submit
At the end of this assignment, you should submit a PDF document with your YARA rules for each problem.
Grading
This assignment is worth 5 total points of extra credit. Points will be awarded based on completion – as long as you make a good-faith effort to complete each problem, you should get full points.
Setting up
To get started, we will generate some toy “malware samples” using Metasploit’s
linux/x64/meterpreter_reverse_http
payload. We will spend the rest of the lab
writing YARA rules for this payload.
Generating samples with msfvenom
We’ve been using malware in some form or another since Lab 3. Remember
msfvenom
1? Every time we’ve been using Metasploit to perform remote
code execution, it’s been generating malicious code (in some form or another)
and executing it on the target. Do you ever remember seeing Metasploit print a
log like this in Lab 3 or Lab 6?
msf > run
...
[!] This exploit may require manual cleanup of '/tmp/EvkPk' on the target
meterpreter>
That shows up because Metasploit uploaded its malicious payload to (in this
case) /tmp/EvkPk
. In the real world, we would collect these samples from the
exploited host and analyze them. For convenience’s sake, though, in this lab
we’ll artificially generate our own samples using msfvenom
(since that’s what
Metasploit is using under the hood anyways). Run the following command:
# Create a directory to store samples in your home directory
mkdir -p ~/samples
Now run the following command 2-3 times:
msfvenom --arch x64 \
--platform linux \
--format elf \
--payload linux/x64/meterpreter_reverse_http \
LHOST=$(hostname) LPORT=4444 \
-o $(mktemp -up ~/samples)
Every time you run this command, it will create a new, randomly-named copy of
the msfvenom payload in your ~/samples/
directory.
We’re going to write a few different rules to detect these samples.
Problem 1: strings
-based rule
Ready? If you haven’t, I would recommend reading the intro to YARA rules in the appendix before starting.
The strings
command searches for all of the text strings that it can find in a
file. Run
strings -n 30 ~/samples/my_sample | sort | uniq
(Replacing my_sample
with an actual sample from your directory.) strings -n 30 ...
extracts all strings that are least 30 characters or longer from the
malicious binary, while ... | sort | uniq
filters those strings to remove
duplicates.
Choose at least four of the strings that stand out as unique. Then create
a file, e.g. myrules.yar
, with a YARA rule to detect them. Your YARA rule
should look something like the following (for example):
rule Msf_Linux_MeterpreterReverseHttp_strings {
meta:
description = "linux/x64/meterpreter_reverse_http - strings"
author = "Your Name Here"
strings:
// Put the strings that you extracted here!
$s1 = "string 1"
$s2 = "string 2"
$s3 = "string 3"
/* ... */
condition:
// Add a condition using these strings to ensure that malware
// samples get correctly identified.
false
}
See the section on testing your rules to see how to check that your rules work correctly.
Problem 2: function-based rules
By default, the payloads generated by msfvenom have debug
symbols in them2.
This makes it fairly straightforward to identify functions from the original
source code of the malware. Try running the following command (with my_sample
replaced by the name of an actual msfvenom payload sample):
objdump -j .text -d ~/samples/my_sample | less
less
will run a terminal pager
that makes it easier to look through the output of objdump
. You can use your
arrow keys or the “page up” / “page down” keys on your keyboard (if you have
them) to scroll through the output. You can also press the /
key to search for
a term, and q
to exit.
You should see some output like this:
0000000000007610 <get_protocol_family>:
7610: 81 ff 00 20 00 00 cmp $0x2000,%edi
7616: 89 f8 mov %edi,%eax
7618: 0f 84 39 01 00 00 je 7757 <get_protocol_family+0x147>
...
0000000000007784 <strip_trailing_dot>:
7784: 48 8b 57 10 mov 0x10(%rdi),%rdx
7788: 48 85 d2 test %rdx,%rdx
778b: 74 24 je 77b1 <strip_trailing_dot+0x2d>
...
This output is telling us that the function get_protocol_family
starts at byte
0x7610
in the binary, and its definition runs until byte 0x7784
(when the
strip_trailing_dot
function starts). It also tells us the assembly code
instructions in those functions along with their hex representation.
Pick out at least two functions, and write a YARA rule that identifies binaries with those functions. You should use the hex representations of their assembly code in your rule, so that your YARA rule looks something like this:
NOTE: you don’t want to have to copy the entire hex representation of the functions by hand! I’ve added a little trick in the appendix (“extracting the bytes of a function”) that you can use to make this process much faster.
rule Msf_Linux_MeterpreterReverseHttp_funcs {
meta:
description = "linux/x64/meterpreter_reverse_http - functions"
author = "Your Name Here"
strings:
// Put the functions that you extracted here!
$f1 = { 81 FF 00 20 ... }
$f2 = { 48 8B 57 10 ... }
/* ... */
condition:
// Add a condition using these strings to ensure that malware
// samples get correctly identified.
false
}
Problem 3: syscall-based rules
NOTE: you don’t have to write your own rule for this one, but you should still follow along with the steps. It’s mostly meant to give you an opportunity to look at some other ways you can analyze malware. There will be a YARA rule at the end that you should add alongsid eyour other rules.
We’re going to try one final type of rule, which will look at the system calls that a program makes. Sometimes malware authors like to be sneaky and try all kinds of tricks to make their malware harder to analyze. One of those tricks is to compress and encrypt the malware. When the malware runs, the malware decrypts and decompresses itself in memory before executing the bulk of the malicious payload.
From an attacker’s point of view, they make analysis a little bit more difficult by obfuscating the malicious payload. And by performing everything in-memory, they ensure that the non-obfuscated payload never gets stored on disk (where a forensic investigator would be able to recover the malware). There are other benefits, too: since the majority of the payload is encrypted it’s impossible to develop a good YARA rule for that portion of the payload. And the compression reduces the payload size and (potentially) raises fewer alarms.
For our final rule, we will try to detect a malware sample that uses some of
these tricks. For this we’ll use a little program called tardis
that I’ve
thrown together3:
# tardis will compress and encrypt the contents of my_sample
tardis ~/samples/my_sample ~/samples/packed
# Give execute permissions for the newly-created sample
chmod a+x ~/samples/packed
If you run objdump
on ~/samples/packed
, you’ll notice that you can’t see the
functions that are defined for the binary anymore.
Still, there are other ways we can inspect what it does. strace
will run this
sample and tell us what syscalls it performs:
$ strace -b execve ~/samples/packed
execve("./out", ["./out"], 0x7ffe57fc7180 /* 36 vars */) = 0
brk(NULL) = 0x5555564e3000
...
mmap(NULL, 1069056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4677cc4000
munmap(0x7f4677dc9000, 720896) = 0
memfd_create("a", MFD_CLOEXEC) = 3
write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0u\216\0\0\0\0\0\0"..., 1068640) = 1068640
execveat(3, "", ["./out"], 0x5555564e5160 /* 0 vars */, AT_EMPTY_PATH
strace: Process 161899 detached
The salient lines are the lines where the malware calls memfd_create
, write
,
and execveat
. In particular,
memfd_create
and
execveat
work in
tandem to allow the malware to create a region of memory where it can store its
decrypted, decompressed payload, and then execute that payload.
Our last YARA rule will try to find programs that perform these two syscalls. Try running the following command:
gdb -x /usr/local/share/cs3710/script.gdb ~/samples/packed
This command uses the GNU Debugger
to execute the sample and trace its execution. script.gdb
automates the
process for you so that you can see the relevant parts of the output, which
should look similar to the following:
|
|
I’ve highlighted the two areas where the memfd_create
and execveat
syscalls
are executed. We can write YARA rules for the assembly used to perform the
syscalls, similar to what we did in Problem 2.
At a minimum, our YARA rules should contain the part where we move the code
corresponding to the syscall (0x13f
for memfd_create
, 0x142
for
execveat
) into the %eax
register, and then perform the syscall
instruction. We can use the assembly printed by GDB above to create the
following YARA rule:
rule Memfdcreate_Execveat_Syscalls {
meta:
description = "program uses the memfd_create and execveat syscalls"
author = "Your Name Here"
strings:
// memfd_create syscall:
// mov $0x13f,eax
// syscall
$s1 = { b8 3f 01 00 00 0f 05 }
// execveat syscall:
// mov $0x142,eax
// mov $0x1000,%r8d
// syscall
$s2 = { b8 42 01 00 00 41 b8 00 10 00 00 0f 05 }
condition:
/*
* The uint32(0) == ... condition is another way of checking
* whether a file is an ELF file. It's equivalent to the
* "$elf at 0" rule in the appendix.
*/
uint32(0) == 0x464C457F
and all of ($s*)
}
You should add this rule to your rules file, alongside the rules from Problems 1 and 2.
Last steps: check your rules
Ensure that you have at least 2 or 3 different malware samples in your
~/samples
directory. To verify that your rules for Problems 1-3 worked
correctly, run them against the generated samples as follows:
yara myrules.yar ~/samples
If your rules work correctly, you should see some output like the following:
Msf_Linux_MeterpreterReverseHttp_strings ./tmp.rZ2v9Lg3GH
Msf_Linux_MeterpreterReverseHttp_funcs ./tmp.rZ2v9Lg3GH
This indicates that both of your rules matched against the samples that you generated.
As a sanity check, you should also try running your rules against some other
files on your machine. If your rules are correct, they shouldn’t flag any files
that aren’t in your ~/samples
directory. I would recommend trying the
following:
yara -r rules.yar /usr/bin
# NOTE: this one will take a while. If it's gone for 15+ minutes
# and hasn't printed any matches, you can call it good and
# Ctrl + C out of the command.
yara -r rules.yar /usr/lib
What to submit
Once you’ve verified that your rules work correctly, submit a document with your
rules for each problem. If your rules flagged any additional files beyond the
malicious files in ~/samples/
, you should also indicate that in your
submission.
Hints
To make your rules a little faster while performing your checks, you might want to add a couple of conditions that filter out files that aren’t ELF files. You could add the following condition to each of your rules:
condition:
uint32(0) == 0x464C457F
and /* ... additional conditions here ... */
or equivalently,
strings:
$elf = "\7fELF"
/* ... */
condition:
$elf at 0
and /* ... additional conditions here ... */
In addition, you might want to skip files that are above a certain size. You can
use the filesize
condition
for that – e.g., you could filter out all files that are larger than 100MB.
Appendix
Intro to YARA rules
YARA bills itself as “the pattern matching Swiss knife for malware researchers”. It is indeed a flexible tool. It allows you to define a list of patterns (“rules”) matching the behavior of various known malware families, run those rules over some data, and report any data that matched those rules.
What makes YARA special is that it’s designed to be usable in many different contexts. You can write YARA rules to match malware files, but you can also write YARA rules that match patterns in network traffic or RAM. And it’s really fast – you can match against thousands of YARA rules with fairly low overhead.
Here’s an example of a YARA rule:
rule ElfRule {
meta:
description = "File Magic - ELF file"
strings:
// Equivalent: $magic = "\x7fELF"
$magic = { 7F 45 4C 46 }
condition:
$magic at 0
}
All this rule does is check whether a file begins with the four bytes [0x7F, 0x45, 0x4C, 0x46]
. This is a “magic
number” used to
identify a file as an ELF
file, which is
the standard format for binary executables used by Linux4.
You should try running these rules5 on your machine:
# Create a file, `rules.yar`, and copy-and-paste the rule
# shown above into the file.
nano rules.yar
# The first command runs the rules on all files in /usr/bin, the
# second command runs the rules on all files in /etc (with -r so
# that it recursively search subdirectories)
#
# Note: need sudo in the second command because not all files in
# /etc are readable by everyone
yara -r rules.yar /usr/bin
sudo yara -r rules.yar /etc
You will find that the first command prints a ton of results, while the second
prints none (or almost none). That’s because the /usr/bin
directory contains
many of the programs you use on a Linux system (the majority of which are
formatted as ELF files). Meanwhile, /etc
contains files related to system
configuration, which are overwhelmingly not ELF files.
Every time YARA finds a file that matches a rule, it prints the file and the rule that it matched.
Let’s look at one more example:
rule IsShellScript {
meta:
description = "Is a shell script"
strings:
$s1 = /#!\/bin\/(sh|dash|bash|fish|zsh)/
$s2 = /#!\/usr\/bin\/(sh|dash|bash|fish|zsh)/
condition:
filesize < 1MB and (($s1 at 0) or ($s2 at 0))
}
If you’ve never seen a regular expression before this can be a little disorienting. What this rule does is check whether a file is smaller than a megabyte. If it is, it then checks whether that file starts with one of the following strings:
#!/bin/sh
,#!/bin/dash
,#!/bin/bash
,#!/bin/fish
,#!/bin/zsh
#!/usr/bin/sh
,#!/usr/bin/dash
,#!/usr/bin/bash
,#!/usr/bin/fish
,#!/usr/bin/zsh
Files that begin with these sequences are shell scripts that run a sequence of Linux commands, e.g.
#!/bin/bash
# A very short shell script that just deletes all of the files
# in the /tmp ("temporary") directory
#
# (Don't actually run this, this is just an example)
echo "[$(date -R)] Starting cleanup..."
rm -vrf /tmp/*
echo "[$(date -R)] Cleanup finished"
If you add these rules to rules.yara
and run it over /usr/bin
again, you’ll
see that the IsShellScript
rule is now picking up a lot of files that it
didn’t see before:
yara rules.yar /usr/bin | grep IsShellScript
# Might print out something like:
#
# IsShellScript /usr/bin/fgrep
# IsShellScript /usr/bin/msf-exe2vbs
# IsShellScript /usr/bin/xfce4-popup-windowmenu
# IsShellScript /usr/bin/msf-msf_irb_shell
# ...
Extracting the bytes of a function
Here’s an example where I extract the assembly code of the function eio_init
from a payload generated by msfvenom
. objdump
shows me the following
information about this function:
0000000000007b9c <eio_init>:
7b9c: 48 83 ec 08 sub $0x8,%rsp
7ba0: 48 89 3d 91 0f 2d 00 mov %rdi,0x2d0f91(%rip) # 2d8b38 <eio_want_poll_cb>
7ba7: 48 8d 3d 22 c7 2d 00 lea 0x2dc722(%rip),%rdi # 2e42d0 <eio_pool+0x170>
7bae: 48 89 35 7b 0f 2d 00 mov %rsi,0x2d0f7b(%rip) # 2d8b30 <eio_done_poll_cb>
7bb5: 31 f6 xor %esi,%esi
# skipping a bunch of lines...
7c5c: 48 c7 05 5d c6 2d 00 movq $0x0,0x2dc65d(%rip) # 2e42c4 <eio_pool+0x164>
7c63: 00 00 00 00
7c67: c7 05 5b c6 2d 00 00 movl $0x0,0x2dc65b(%rip) # 2e42cc <eio_pool+0x16c>
7c6e: 00 00 00
7c71: 5a pop %rdx
7c72: c3 ret
0000000000007c73 <timers_reschedule>:
7c73: 8b 8f bc 01 00 00 mov 0x1bc(%rdi),%ecx
7c79: 31 c0 xor %eax,%eax
...
This function starts at byte 0x7b9c
and ends at byte 0x7c73
. Therefore I
start by setting the following two variables in my terminal:
START=$((0x7b9c))
END=$((0x7c73))
Now I use the program xxd
to produce a
hex dump of the file. Then I pipe this
output to the fold
and tr
programs so that it’s formatted in a way that will
make it easy to use in my YARA rule:
# fold -w 2 groups characters into groups of 2
# tr '\n' ' ' replaces all newlines with spaces
xxd -u -p -s $START -l $(($END-$START)) my_sample \
| fold -w 2 | tr '\n' ' '
(You can check out man xxd
to see what each
of the flags to xxd
mean, if you’re curious.) In my case when I ran this
command, I got the following:
$ START=$((0x7b9c))
$ END=$((0x7c73))
$ xxd -u -p -s $START -l $(($END-$START)) tmp.V35epF7Hkd \
| fold -w 2 | tr '\n' ' '
48 83 EC 08 ... (many bytes later) ... 00 00 5A C3
Here’s an example of a YARA rule that checks for the existence of this function in a binary. Note that for your actual YARA rule for Problem 2, you should check for the existence of at least two different functions.
rule Is_Msf_Payload_Function {
meta:
description = "linux/x64/meterpreter_reverse_http - functions"
strings:
// ELF header
$elf = "\x7fELF"
// eio_init
$f1 = {
48 83 EC 08 48 89 3D 91 0F 2D 00 48 8D 3D 22
C7 2D 00 48 89 35 7B 0F 2D 00 31 F6 E8 48 54
/* a few lines later... */
5D C6 2D 00 00 00 00 00 C7 05 5B C6 2D 00 00
00 00 00 5A C3
}
condition:
// You don't strictly need to check whether the start of the
// file begins with the ELF file header, but in practice it
// can be a good idea for efficiency's sake
$elf at 0 and $f1
}
Additional references
Your main reference for how to write YARA rules should probably be the YARA documentation:
https://yara.readthedocs.io/en/stable/writingrules.html
If you want to see some real-world examples of YARA rules, here are some GitHub repositories you can check out:
-
awesome-yara: this repository has a list of many different companies and projects using YARA, including links to YARA rulesets.
-
Yara-Rules/rules: this is a massive repository with rules for many different malware families.
-
Neo23x0/signature-base: another repository of many different YARA rules.
-
Some people might be pedantic and claim that
msfvenom
is a payload generation tool rather than a malware generation tool. My personal definition of malware is broad enough that I would define anmsfvenom
payload as malware, but to each their own. In any case, it’s the kind of tool that defenders want to create detections for. ↩︎ -
In the real world, malware samples don’t typically have debugging symbols embedded in them. However, you can still use a reverse-engineering tool like Ghidra or Binary Ninja to identify individual functions and use the approach we’re taking in this problem. ↩︎
-
I threw this together for a competition a little while back.
tardis
is what’s commonly referred to as an executable packer, although in this case it also self-encrypts. Incidentally, thememfd_create
+execveat
method it uses is fun (and “easy” enough that you can quickly write your own packer based on it if you’re under time constraints), but it isn’t that sophisticated, at least as far as these techniques go. You can find out more about similar methods by looking up reflective code loading (MITRE ATT&CK T1620; DS0009). ↩︎ -
The Windows equivalent to ELF is the Portable Executable (PE) format. If you’re interested in obscure Linux features, the Linux kernel supports something called
binfmt_misc
which allows Linux to run other kinds of executable files (like PE). If you’ve ever used Wine or Proton to run a Windows program on a Linux machine, you’ve usedbinfmt_misc
. ↩︎ -
It’s actually possible to compile these rules into a special binary format, so that YARA doesn’t have to preprocess them every time you run it. In the real world this can be a lot more efficient. To compile the rules, you would run
yarac rules.yar rules.yrc
, and then run YARA asyara -C rules.yrc /path/to/directory
. ↩︎