Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Lab 1: Fuzzing

Spring Semester 2024

Due: 29 January, 8:00 a.m. Eastern Time

Corresponding Lesson: Lesson 3 (Random Testing)

Objective

In Part 1 you will implement a simple tool to automatically check for divide-by-zero errors in C programs at runtime. You will create an LLVM pass that will instrument C code with additional instructions that will perform runtime checks, thus creating a sanitizer, a form of lightweight dynamic analysis. In the spirit of automated testing, your tool will provide a code coverage mechanism that will show the actual instructions that execute when a program runs.

In Part 2 you will implement a fuzzer that will use mutation strategies to create inputs to automatically test simple programs. As we discussed in the lesson, we hope to get lucky and cause the input program to crash on some generated data. You will see how this specialized form of mutation analysis can perform well enough to encourage developers to use this technique to help test their software.

In Part 3 you will extend your fuzzer from Part 2 to make more interesting choices about the kinds of input it generates to test a program. The fuzzer will use output from previous rounds of test asfeedback to direct future test generation. You will use the code coverage metrics implemented in Part 1 to help select more interesting seed cases for your fuzzer to mutate.

Please note that this lab is demonstrating a random testing technique. For us to give you a real taste of random testing, we have opted not to design a deterministic lab where you will see the exact same results every time you run your solutionsincereal random testing is not deterministic. We have some tolerances built into the grader so you can have high confidence that ifyou are consistently crashing the provided programs, you will also consistently crash the   hidden programs. As this lab is the only lab covering random testing, you can expect future labs  to be easier to know that your solution is correct as most of our labs are deterministic techniques where you will get the exact same results on every execution ofyour code.

Note on Past Issues

In past semesters, it has caused a high number of students to be submitted to the Office of Student Integrity for Academic Integrity violations. In particular, it’s possible to find solutions to similar analyses on the internet. Looking at these solutions in any form is likely to influence your thinking and cause your solutions to be similar to them, which is an Academic Integrity violation in this class. If you are unclear of our guidelines for what is collaboration and what is cheating, we suggest reviewing that section of the syllabus. If you have any questions about what is allowed and what is not allowed, please privately on Ed Discussions for clarification.

Students who submit solutions found to be similar to online resources or other students should expect a 0 grade on the lab, a disciplinary record of an Academic Integrity issue through the Office of Student Integrity, and will not be eligible to receive a final grade ofA in the course.  Students who have had past Academic Integrity issues may find that OSI assigns them higher penalties.

Resources

   Lab Intro Video

-  On the Canvas page for this assignment

●   Enumerating basic blocks and instructions in a function:

-  http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#basic-i

nspection-and-traversal-routines

   Instrumenting LLVM IR

-  http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#creatin

g-and-inserting-new-instructions

   Important classes

-  http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#the-fun

ction-class

-  http://cs6340.cc.gatech.edu/LLVM8Doxygen/classllvm 1 1CallInst.htm

l

-  http://cs6340.cc.gatech.edu/LLVM8Doxygen/classllvm 1 1DebugLoc.htm

l

   Fuzzing

-  https://www.fuzzingbook.org/html/Fuzzer.html

   Code Coverage

-  https://www.fuzzingbook.org/html/Coverage.html#Comparing-Coverage

Setup

Download fuzzing.zip from Canvas and unzip in the home directory on the VM (note: there is an extraneous fuzzing directory on the VM already - you can delete or ignore the contents).  This  will create a fuzzing directory with part1, part2,and part3 subdirectories.

Part 1 - Simple Dynamic Analysis

Setup

The skeleton code is located under /fuzzing/part1/.  We will refer to the top level directory for Part 1 as part1 when describing file locations.

Run the following commands to setup this part:

$ cd part1

$ mkdir build

$ cd build

$ cmake ..

$ make

You should see several files created in the current directory. This builds an LLVM pass from code that we provide, part1/src/Instrument.cpp, named InstrumentPass.so.

Note each time you update Instrument.cpp you will need to rerun the make command in the build directory before testing.

Next, let’s run our dummy Instrument pass over some C code that contains a divide-by-zero error:

$ cd ../test

$ clang -emit-llvm -S -fno-discard-value-names -c -o simple0.ll simple0.c -g

$ opt -load ../build/InstrumentPass.so -Instrument -S simple0.ll -o

simple0.instrumented.ll

$ clang -o simple0 ../lib/runtime.c simple0.instrumented.ll

$ ./simple0

If you’ve done everything correctly up to this point, you should see Floating point exception  (core dumped). For the lab, you will complete the Instrument pass to catch this error at runtime.

Format of Input Programs

All C programs are valid input programs.

Lab Instructions

In this lab, you will implement a dynamic analysis tool that catches divide-by-zero errors at runtime. A key component of dynamic analysis is that we inspect a running program for information about its state and behavior. We will use an LLVM pass to insert runtime checking and monitoring code into an existing program. In this lab, our instrumentation will perform divide-by-zero error checking, and record coverage information for a running program. In the following part of the lab, we will introduce an automated testing framework using our dynamic analysis.

Instrumentation and Code Coverage Primer. Consider the following code snippet where we have two potential divide-by-zero errors, one at Line 1, the other at Line 2.

int main () {

int x1 = input();

int y = 13 / x1;   // Line 1

int x2 = input();

int z = 21 / x2;   // Line 2

return 0;

}

If we wanted to program a bit more defensively, we would manually insert checks before these divisions, and print out an error if the divisor is 0:

int main () {

int x1 = input();

if (x1 == 0) { printf(“Detected divide-by-zero error!”); exit(1); }

int y = 13 / x1;

int x2 = input();

if (x2 == 0) { printf(“Detected divide-by-zero error!”); exit(1); }

int z = 21 / x2;

return 0;

}

Of course, there is nothing stopping us from encapsulating this repeated check into some function, call it __dbz_sanitizer__, for reuse.

void __dbz_sanitizer__ (int divisor) {

if (divisor == 0) {

printf("Detected divide-by-zero error!");

exit(1);

}

}

int main () {

int x1 = input();


__dbz_sanitizer__ (x1);

int y = 13 / x1;

int x2 = input();

__dbz_sanitizer__ (x2);

int z = 21 / x2;

return 0;

}

We have transformed our unsafe version of the code in the first example to a safe one by

instrumenting all division instructions with some code that performs a divisor check. In this lab, you will automate this process at the LLVM IR level using an LLVM compiler pass.

Debug Location Primer. When you compile C code with the -g option, LLVM will include debug information for LLVM IR instructions. Using the aforementioned instrumentation techniques, your LLVM pass can gather this debug information for an Instruction, and forward it to __dbz_sanitizer__ to report the location a divide-by-zero error occurs. We will discuss the specifics of this interface in the following sections.

Instrumentation Pass. We have provided a framework from which to build your LLVM instrumentation pass. You will need to edit the part1/src/Instrument.cpp file to implement your divide-by-zero sanitizer, as well as the code coverage analysis part1/lib/runtime.c contains functions that you will use in your lab:

-  void __dbz_sanitizer__ (int divisor, int line, int col)

-    Output an error for line:col if divisor is 0

-  void __coverage__ (int line, int col)

-    Append coverage information for line:col in a file for the current executing process

As you will create a runtime sanitizer, your dynamic analysis pass should instrument the code with these functions. In particular, you will modify the runOnFunction method in Instrument.cpp to perform this instrumentation for all LLVM instructions encountered inside a function.

Note that our runOnFunction method returns true in Lab 1. In Lab 0, we returned false in similar places. As we are instrumenting the input code with additional functionality, we return true to indicate that the pass modifies, or transforms the source code it traverses over.

In short, part 1 consists of the following tasks:

1.   Implement the instrumentSanitizer function to insert a __dbz_sanitizer__ check for a supplied Instruction

2.   Modify runOnFunction to instrument all signed and unsigned integer division instructions with the sanitizer for a block of code

3.   Implement the instrumentCoverage function to insert __coverage__ checks for all debug locations

4.   Modify runOnFunction to instrument all instructions with the coverage check

Inserting Instructions intoLLVM code. By now you are familiar with the BasicBlock and

Instruction classes and working with LLVM instructions in general. For this lab you will need to use the LLVM API to insert additional instructions into the code when traversing a BasicBlock. There are many ways to traverse programs in LLVM. One common pattern when working with LLVM is to create a new instruction and insert it directly after some previous instruction.

For example, in the following code snippet:

Instruction* Pi = ...;

auto *NewInst = new Instruction (..., Pi);

A new instruction (NewInst) will be created and implicitly inserted before Pi; you need not do anything further with NewInst. Other subclasses of Instruction have similar methods for this -  consider looking at llvm::CallInst::Create.

Loading C functions intoLLVM. We have provided the auxiliary functions __dbz__sanitizer and __coverage__ for you, but you have to insert them into the code as LLVM Instructions.

You can load a function into the Module with Module::getOr InsertFunction.

getOr InsertFunction requires a string reference to a function to load, and a FunctionType that matches the function type of the actual function to be loaded (you will have to construct these items). It’s up to you to take the loaded Function and invoke it with an LLVM instruction.

Debug Locations. As we alluded to in the primer, LLVM will store code location information of the original C program for LLVM instructions when compiled with -g. This is done through the DebugLoc class:

Instruction* I1 = … ;

DebugLoc &Debug = I1->getDebugLoc ();

printf("Line No: %d\n", Debug.getLine ());


You will need to gather and forward this information to the sanitizer functions. As a final hint,  not every single LLVM instruction corresponds to a specific line in its source C code. You will have to check which instructions have debug information. Use this to help build the code coverage metric instrumentation.

Example Input and Output

Your sanitizer should run on any C code that compiles to LLVM IR.  For each test program, you will need to run the following command substituting your program name for simple0.  (Note: there is also a Makefile provided for you that will run the commands for all the programs at once by executing the  “make” command)

$ cd part1/test

$ clang -emit-llvm -S -fno-discard-value-names -c -o simple0.ll simple0.c -g

As we demonstrated in the Setup section, we will create an instrumented executable using your LLVM compiler pass.  To test a different program, replace simple0 with your program name.

$ opt -load ../build/InstrumentPass.so -Instrument -S simple0.ll -o

simple0.instrumented.ll

$ clang -o simple0 ../lib/runtime.c simple0.instrumented.ll

$ ./simple0

If there is a divide by zero error in the code, your code should output the following (recall the print statement is already set up in /lib/runtime.c - the line and column number will come  from your code).

Divide-by-zero detected at line 27 and col 13

Code coverage information should be printed out in a file named EXE.cov where EXE is the name of the executable that is run (in the above case, look for simple0.cov.) Our auxiliary functions    will handle the creation of the file; your instrumented code should populate it with line:col information:

25,7

25,7

26,7

26,11

26,7

27,7

27,11

27,15


Note that a correct solution will produce the exact same output as given.

Part 2 - Mutational Fuzzing

Setup

The skeleton code for Part 2 is located under /fuzzing/part2. We will refer to the top level directory for Part 2 as part2 when describing file locations.

The following commands setup the lab:

$ cd part2

$ mkdir build

$ cd build

$ cmake ..

$ make

This time we will use the fuzzer tool to feed randomized input (that you will create) into compiled C programs that will run with a reference implementation of the sanitizer from Part 1:

$ cd ../test

$ mkdir fuzz_output

$ ../build/fuzzer ./sanity fuzz_input fuzz_output MutationA

In the above command, sanity will receive input from fuzzer starting from an initial seed file at part2/test/fuzz_input/seed.txt located in fuzz_input. Cases that cause a program crash will get stored in part2/test/fuzz_output/failure. The starter code will guarantee that the fuzzer runs for a maximum of 10,000 iterations and exits as soon as a crash is found.