PA #1: Fuzzing: xfuzz

Due: Sunday, Oct 2 11:59PM

xfuzz GitHub repository: https://github.com/kernelmethod/xfuzz

For your first programming assignment, you will design a web fuzzer similar to ffuf, which featured heavily in Labs 2 and 3. This program will be written in Python, and will primarily focus on using the aiohttp library to perform fuzzing as quickly as possible.

Getting started

For this assignment, you will be developing a tool called xfuzz. The behavior of xfuzz is quite similar to ffuf and wfuzz, in that it takes a wordlist, a URL, and one or more parameters and replaces all occurrences of the word FUZZ with terms from the wordlist.

I have already created an xfuzz GitHub repository with some skeleton code you can use to start creating your tool. You should start by following the installation instructions to download the xfuzz code and install all of the Python packages you will need to run it and to run tests for it.

Once you’ve done that, you can take a look at the assignment instructions for details on what you’ll need to do for this assignment. In brief: you will need to build out xfuzz (starting from the fuzz() function in xfuzz/fuzz.py) to include all of the features that you see when you run

python3 -m xfuzz --help

In addition, to get 100% on the assignment, you will need to make xfuzz as fast as possible.

Testing

To ensure that your implementation is correct, you should test xfuzz using the interactive server as well as pytest, as described in the README. In addition, I have stood up a separate version of the test server on http://cs3710.kerneltrick.org. Once you’ve started working on xfuzz, you should try running it against the server, e.g.

python3 -m xfuzz -u http://cs3710.kerneltrick.org/enum/FUZZ \
  -w test/wordlists/common.txt

Grading

This assignment is scored out of 9 points, broken down as follows:

Correct implementation (7 points)

You will receive 7 points for having a correct implementation of xfuzz. We will determine this using a test suite based on PyTest. Note that this test suite primarily checks that xfuzz fuzzes the correct URLs. If your fuzzer does not generate correct output (see the expected program output) we may take off additional points.

If you wish to try running the test suite yourself, see the instructions for using PyTest in the repository README.

To get all seven points, xfuzz does not need to be especially fast. However, to ensure that we can grade your assignments in a timely manner, the test suite will automatically stop running after 15 minutes. If any of the tests are still running after this time, they will automatically fail.

Fast implementation (2 points)

You will get two additional points for having a reasonably fast implementation of xfuzz. Our primary criteria will be to run

python3 -m xfuzz -w test/wordlists/common.txt \
  -H 'Content-Type: application/json' \
  -X POST -mc 200 -d '{"username": "admin", "password": "FUZZ"}' \
  -u http://cs3710.kerneltrick.org/auth/login

against the live server running on http://cs3710.kerneltrick.org. You will get one point if you can do a full scan of the server with this command in < 2 minutes, and two points if you can do it in < 1 minute.

Note: for grading purposes we will run these tests locally with 200ms simulated latency to ensure consistency. The server on http://cs3710.kerneltrick.org has been configured so that no matter what your internet connection is like, xfuzz will run strictly faster during our grading than it does during your tests against this machine.

Hints

Windows users

Some of the commands provided in the assignment description don’t quite work the same way in the Windows command prompt. If you are having difficulties, my first suggestion would be to run the commands in Powershell (which should be pre-installed on your machine), or to install WSL (Windows Subsystem for Linux) and run it there.

Otherwise, here are some fixes you can make to the provided commands to get them to work correctly on your machine:

  • Replace the / character in paths to files with backslash \.
  • Replace calls to python and python3 with python.exe, e.g.: python.exe -m xfuzz --help
  • Replace single quotation marks ' with double quotation marks ". In addition, you should escape double quotation marks that are inside of commands.

With these changes, the command

python3 -m xfuzz -w test/wordlists/common.txt \
  -H 'Content-Type: application/json' \
  -X POST -mc 200 -d '{"username": "admin", "password": "FUZZ"}' \
  -u http://cs3710.kerneltrick.org/auth/login

would become the following:

python.exe -m xfuzz -w test\wordlists\common.txt \
  -u http://127.0.0.1:25373/auth/login \
  -H "Content-Type: application/json" \
  -X POST -mc 200 -d \
  "{\"username\": \"admin\", \"password\": \"FUZZ\"}"

Writing a fast fuzzer

Before you focus on optimization, you should ensure that your fuzzer implementation is correct and that it implements all of the features that you need.

Once you’re ready to start focusing on making your fuzzer fast, there are multiple routes you can take. Your fuzzer’s biggest bottleneck is in waiting for HTTP requests to complete; your CPU might be waiting hundreds of milliseconds for an HTTP request to finish, which is a lifetime from the CPU’s perspective (which usually operates on the order of nanoseconds). Therefore, you’ll want to find a way to run multiple HTTP requests concurrently.

To this end, I suggest using Python’s asynchronous I/O features to their full extent. Recall that aiohttp uses Python’s async / await keywords so that you can run multiple HTTP requests concurrently, for instance:

import aiohttp
import asyncio
from asyncio import create_task

async def main():
  tasks = []
  urls = [
    "http://www.example.org/a",
    "http://www.example.org/b",
    "http://www.example.org/c",
  ]

  async with aiohttp.ClientSession() as sess:
    for u in urls:
      task = asyncio.create_task(sess.request("GET", u))
      tasks.append(task)
    responses = await asyncio.gather(*tasks)
 
if __name__ == "__main__":
  asyncio.run(main())

In this example, we made HTTP requests to three different URLs concurrently and then waited for them all to complete with asyncio.gather. Now instead of waiting for all three requests to complete one after another, we can just wait for them to all complete at the same time.

I recommend structuring your program so that it consists of a “job scheduler”, which identifies work that needs to be done (in this assignment, HTTP requests) and “workers”, which perform that work. The scheduler puts jobs onto a queue, while workers take jobs off the queue and run them concurrently.

To implement this in Python, you can use asyncio.Queue to create the work queue and asyncio.create_task to construct the scheduler and workers. Here is some rough pseudocode for what that might look like (the documentation for asyncio.Queue includes another example):

import asyncio

async def fuzz(args):
  # Perform some pre-processing here with input arguments...

  work_queue = asyncio.Queue()
  tasks = []

  # Create a scheduler task to queue up jobs
  s = asyncio.create_task(scheduler(queue, jobs))
  tasks.append(s)

  # Create workers to consume jobs
  for _ in range(n_workers):
    w = asyncio.create_task(start_worker(queue))
    tasks.append(w)

  # Wait for the scheduler and the workers to finish
  await asyncio.gather(*tasks)


async def scheduler(queue, jobs):
  # Put jobs onto the queue so that workers can execute them
  for job in jobs:
    await queue.put(job)

  # Put None onto the queue once for each worker so that they know
  # there isn't any more work to do
  for _ in range(n_workers):
    await queue.put(None)


async def start_worker(queue):
  while True:
    # Get some new work off the queue
    job = await queue.get()

    try:
      # If the job is `None`, there's no more work to do, so the
      # worker can exit
      if job is None:
        break

      do_work_for_job(job)

    finally:
      # Mark the job as being completed
      queue.task_done()

One important note: if you choose to use this method, you do not want to create a new task for every single HTTP request you make. Your machine will spawn thousands of HTTP requests ~instantaneously, and your program will (probably) crash from resource depletion.