Assignment 03

CSE 130-02: Principles of Computer System Design, Spring 2021


Due: Thursday, June 3 at 11:59PM


Goals

The goals for Assignment 3 are to create an HTTP reverse proxy with cache. That means that your program will act as an HTTP server but will actually forward received requests to the requested object’s origin server in case the object is not cached locally, and then forward the response from the origin server to the original client. In order to avoid sending requests to servers and thus reduce response time and traffic on the network, the proxy cache (or proxy server) will keep copies of objects returned from previous GET requests in a local cache in memory. If a future GET request is made for a cached object, the proxy will return the cached copy — as long as the cached copy is recent enough. Thus your program will act both as server and a client.

We will provide you with a working HTTP server executable that you can use to test your proxy server, and we recommend you use this instead of your Assignment 1 HTTP server implementation (the provided executable implements more features). Your main goal is to implement the reverse proxy. That said, your reverse proxy server will have to handle the same types of requests that you have seen before (GET, HEAD and PUT), and must support persistent connections.

As usual, you must have a design document along with your README.md in your git repository. Your code must build an executable named httpproxy using make.


Programming assignment: HTTP reverse proxy

Design document

Before writing code for this assignment, as with every other assignment, you must write up a design document. Your design document must be called DESIGN.pdf, and must be in PDF format (you can easily convert other document formats, including plain text, to PDF).

Your design document should describe the design of your code in enough detail that a knowledgeable programmer could duplicate your work. This includes descriptions of the datastructures you use, non-trivial algorithms and formulas, and a description of each function with its purpose, inputs, outputs, and assumptions it makes about inputs or outputs.

Write your design document before you start writing code. It’ll make writing code a lot easier. It will help you think about what you need to do for this assignment, and it can help you identify possible problems with your planned implementation before you have invested hours in it. Also, if you want help with your code, the first thing we’re going to ask for is your design document. We’re happy to help you with the design, but we can’t debug code without a design any more than you can.

Since a lot of the system in Assignment 3 is similar to Assignments 1 and 2, we expect you’re going to “copy” a good part of your design from your previous designs. This is fine, as long as it’s your previous assignment you’re copying from. This will let you focus on the new stuff in Assignment 3.

Start early on the design. This program can be built independently of previous assignments, but if you didn’t get the previous assignments to work, you will probably need help. Please see the course staff ASAP for help in that case.


TESTING AND ASSIGNMENT QUESTION

In the design document, you will also describe the testing you did on your program and answer any short questions below. The testing can be unit testing (testing of individual functions or smaller pieces of the program) or whole system testing, which involves running your code in particular scenarios.

For Assignment 3, please answer the following questions:

Using a large file (e.g. 100 MiB — adjust according to your computer’s capacity) and the provided HTTP server:

Start the server with only one thread in the same directory as the large file (so that it can provide it to requests);

Start your proxy with no cache and request the file ten times. How long does it take?

Now stop your proxy and start again, this time with cache enabled for that file. Request the same file ten times. How long does it take?

Aside from caching, what other uses can you consider for a reverse proxy?


Program functionality

You may not use standard libraries for HTTP; you have to implement this yourself. You may use standard networking (and file system) system calls, but not any FILE * calls except for printing to the screen (e.g., error messages). Note that string functions like sprintf() and sscanf() aren’t FILE * calls.

Your code must be in C and be compiled with no errors or warnings using the following flags: -Wall -Wextra -Wpedantic -Wshadow

Once again your program will take a port number as a parameter, but this time it will be followed by another port number to identify the address of the HTTP server. Unlike Assignment 2, your reverse proxy does not need to be multithreaded. Those two parameters can be accompanied by three optional parameters that configure the cache: “c”, a non-negative integer specifying the capacity of the cache (the number of items that can be stored); “m”, a non-negative integer specifying the maximum file size to be stored in the cache; “u”, a flag option that enables Least Recently Used (LRU) replacement policy — the default replacement policy will be First In First Out (FIFO).. The default values for “c” and “m” will be 3 and 65536, respectively. The following examples are then valid:

./httpproxy 9090 8080 -c 4 -u

Starts httpproxy on port 9090

Communicates with a server running on port 8080

Cache can hold four files

Each file in cache can have at most 65536 bytes

The replacement policy is LRU

./httpproxy 8181 1234

Starts httpproxy on port 8181

Communicates with a server running on port 1234

Cache can hold three files

Each file in cache can have at most 65536 bytes

The replacement policy defaults to FIFO (no “u” option is given)

./httpproxy -m 100 7373 2525

Starts httpproxy on port 7373

Communicates with a server running on port 2525

Cache can hold three files

Each file in cache can have at most 100 bytes

The replacement policy defaults to FIFO (no “u” option is given)

./httpproxy 8383 -c 1 -u 3434 -m 100000000

Starts httpproxy on port 8383

Communicates with a server running on port 3434

Cache can hold one file

Each file in cache can have at most 65536 bytes

The replacement policy is LRU

./httpproxy 7654 -m 512 -c 4 1234

Starts httpproxy on port 7654

Communicates with a server running on port 1234

Cache can hold four files

Each file in cache can have at most 512 bytes

The replacement policy defaults to FIFO (no “u” option is given)


Proxying

You are implementing a reverse proxy, which means that the client is not aware that it is communicating through a proxy. That means your proxy will receive requests from the client in the same way that your previous server received them. The proxy will forward requests to the server when the requested object is not cached locally, then receive the corresponding responses from the server and forward them to the client. When the proxy receives a GET request for a resource that is cached, it should verify that the cached copy is not obsolete by sending a HEAD request to the server and checking the Last-Modified header line, whose value is a date/time for when the requested object was last modified. The provided HTTP server already implements this header so you can use it directly. An example of the Last-Modified header is given below.

Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT

Your reverse proxy should compare the last modified date with the age of the stored object. If the stored object is the same age or is newer, the proxy can respond directly to the client without forwarding to the server (it will still have to check if the file is not obsolete, however — see Caching). Otherwise the proxy should forward the request to the server as usual.


Caching

Your proxy has a number of parameters defining how the cache will work. The first parameter, specified by the option “s”, defines how many items the cache can hold. That is just the number of files that can exist in the cache at once. If another file were to be added to the cache once itreaches the maximum number of files, then the new file should replace one of the existing files in the cache, according to a replacement policy.

The second parameter, specified by the option “m”, defines the maximum size of a file to be cached. That is the size in bytes, as present in the Content-Length header line of a response to a GET request. Your proxy should only add to its cache files that are equal in size or smaller than this value. What if the file is larger? In that case the file will not be stored in the cache. The third parameter is the replacement policy. The default policy is First In First Out (FIFO), meaning that if a file has to be replaced, the file that was the oldest to be added to the cache should be replaced. If the flag “u” is provided when starting the proxy, the policy will be instead Least Recently Used (LRU), meaning that the file to be replaced is the one that has spent the most time in the cache without being requested.

As an example of how these two replacement policies differ, consider three consecutives requests for files A, B, and A again, with caching of both files. If a fourth request for file C arrives, and the cache can only hold two files, FIFO will result in A leaving the cache, because it was added first, while LRU will result in B leaving the cache, because A was requested more recently.

  Cached files should be held in memory, without creating any files in disk.


Testing your code

You should test your code on your own system. You can run the server and the proxy on localhost using a port number above 1024 (e.g., 8888). Come up with requests you can make of your server, and try them using curl(1). curl takes a URL and will make requests for those. By default these requests are of the GET type. Some useful options:

-T <file>: makes curl send a PUT request, sending the contents of <file>. <file>

does not need to match the resource name in the URL;

-I: makes curl send a HEAD request;

-v: runs curl in verbose mode. By default curl will only print the body of the

received response (or the full response if it sent a HEAD request). In verbose

mode curl will also print the request and the response headers, identified by a ‘>’

if curl is sending it and a ‘<’ if curl is receiving it;

-o <file> curl saves the output to <file>;

curl can also send multiple requests if it has multiple URLs as parameters. As with the previous assignment, connections are persistent and any connection may contain multiple requests. Note that you’ll need to run your server in one terminal, and make requests using curl in a separate terminal. You can see examples of curl commands in the Hints section. For more on curl check https://everything.curl.dev/http/requests

Remember that in this assignment you are working on a proxy that will communicate with a server, so your curl requests should be directed towards the proxy. That is, if the server is running on port 8080 and the proxy is on port 9090, you would use the command

curl http://localhost:9090/01234abcde01234

to request file 01234abcde01234 through the proxy.

You might also consider cloning a new copy of your repository (from GitLab) to a clean directory to see if it builds properly, and runs as you expect. That’s an easy way to tell if your repository has all of the right files in it. You can then delete the newly-cloned copy of the directory on your local machine once you’re done with it.


README

As for previous assignments, your repository must include (README.md). The README.md file should be short, and contain any instructions necessary for running your code. You should also list limitations or issues in README.md, telling a user if there are any known issues with your code.


Submitting your assignment

All of your files for Assignment 1 must be in the asgn1 directory in your git. When you push your repository to GitLab@UCSC, make sure to include the following:

There are no “bad” files in the asgn3 (i.e., object files).

Your assignment builds in asgn3 using make to produce httpserver.

All required files (source files, DESIGN.pdf, README.md) are present in asgn3.

Note that you do not have to write a client program nor a server, only the proxy.

After pushing your submission to GitLab, submit your commit id to this Google Form:

https://forms.gle/QSh7EnfxuN48Xzbv5


Hints

Start early on the design. This program can be built independently of previous assignments, but if you didn’t get the previous assignments to work, you will probably need help. Please see the course staff ASAP for help in that case. 

Reuse your code from assignments 1 and 2. (No need to cite this) 

We have updated the skeleton code for this assignment. It now has one function to create a socket as a client. You can use it, or you can just copy the new function to the code base you have already built. 

Aggressively check for and report errors via a response. Transfers may be aborted on errors. However, the server doesn’t exit on an error; it deals with the error appropriately (sending the corresponding error code for the client if possible) and ends the connection in that thread, leaving the thread free to handle another connection.

Use getopt(3) to parse options from the command line. Read the man pages and see examples on how it’s used. Ask the course staff if you have difficulty using it after reading this material.

Your commit must contain the following files:

README.md

DESIGN.pdf

Makefile

source file(s) for the server

It may not contain any .o files or other compiled files. It may not contain data files that you create for testing either. You may, if you wish, include the “source” files for your DESIGN.pdf in your repo, but you don’t have to. After running make, your directory must contain httpserver. Your source files must be .c files (and the corresponding headers, if needed).

You can use the strptime(3) function to parse the content of the Last-Modified header line. To use it in your code, define __USE_XOPEN and then include time.h, in this order. That will look like this in your code:

#define __USE_XOPEN

#include <time.h>

If you need help, use online documentation such as man pages and documentation on Makefiles. If you still need help, ask the course staff.


Grading

As with all of the assignments in this class, we will be grading you on all of the material you turn in, with the approximate distribution of points as follows: design document and answer to assignment question (30%); coding practices (10%); functionality (60%).

Your code must compile to be graded. A submission that cannot compile may receive a maximum grade of as low as 5%.