Hello, dear friend, you can consult us at any timeif you have any questions, add WeChat: daixieit


COMP226 Assignment 1

Continuous Assessment

Continuous Assessment Number
1 (of 2)
Weighting
10%
Assignment Circulated
Monday 23 February 2026
Deadline
17:00 Fri 20 March 2026
Submission Mode
Submit a single Python file “solution.py” to the CodeGrade Assignment on Canvas
Learning Outcomes Assessed
Have an understanding of market microstructure and its impact on trading.
Goal of Assignment
Reconstruct a limit order book from order messages; com pute quantities based on the limit order book
Marking Criteria
Pre-deadline visible CodeGrade tests of correctness of 6 functions (70%); Post-deadline CodeGrade tests of 4 further functions (30%)
Submission necessary in order to satisfy module requirements
No
Expected time taken
Roughly 8-12 hours
Module Coordinator (Contact)
Dr Andrew Roxburgh ([email protected])

Late Submission

Standard UoL policy applies; every student may submit up to 7 calendar days after the deadline, without penalty. Students who have a Student Support Information Sheet (SSIS) that includes the "coursework deadlines extensions allowed" adjustment can make use of a further 7 day extension period (total of 14 calendar days after the deadline). Any attempt to submit more than 7 days (or 14 days with an adjustment) after the deadline will be classed as non-submission, and will receive a mark of zero. .

Warning

Submissions are automatically put through a plagiarism and collusion detection system. Students found to have plagiarized or colluded will likely receive a mark of zero. Do not discuss or show your work to others, and do not search for solutions to the assignment online. In previous years, students have had their studies terminated and left without a degree because of plagiarism or collusion.

Python

In this assignment, we use Python to run our code in any IDE or in the terminal, e.g., python main.py input/book_1.csv input/empty.txt

Distributed code and sample input and output data

As a first step, please download comp226_a1.zip via the assignment page on Canvas. Then unzip

comp226_a1.zip, which will yield the following contents in the directory comp226_a1:

Brief summary
You are provided with three .py files, two complete, which should not be edited:
  • main.py is the file you will run with Python, by specifying several command line arguments described below (see example above);
  • common.py contains complete working functions that are used by main.py in conjunction with the incomplete functions in template.py; and one incomplete file:
  • template.py is the file that you will edit – the distributed version contains empty functions. It contains 10 empty functions.

You need to change the file name from template.py to solution.py first. Then it will work with main.py. solution.py is also the file name you need to submit.

To get 70% of marks, you will need to correctly complete the first 6 functions; if your answer is only partially correct you will get a mark less than 70%. If you have correctly completed all these 6 functions, you can then – and only then – get marks for the 4 “extra” functions, which together account for 30% of the marks.

You should submit a single Python file that contains your implementation of some or all of these 10 functions.

Your submission will be marked via extensive automated tests of correctness across a wide range of example cases, with some chosen randomly at test time:

  • The tests for the first 6 functions, which give up to 70% of the marks, will run at the time of submission and are fully visible pre-deadline on CodeGrade (detailed guidance on using CodeGrade to improve your mark can be found at the end of this document);
  • The tests for the final 4 functions will only run post-deadline, and only if you got full marks for the first 6 functions. You can (and if required should) submit multiple times to repeatedly use the CodeGrade pre-deadline tests; for a correct solution CodeGrade will show that you pass all tests and have thus achieved the first 70% of marks. It probably does not make sense for you to work much on the final 4 functions until you have achieved this and submitted completely correct versions of the first 6 functions.

In addition to the visible pre-deadline tests on CodeGrade, for the first 6 functions, correct sample output is provided so that you can check correctness of your implementations “offline” (without submitting to CodeGrade). Offline testing is quick to do once you have set it up, and if you match all the offline examples then chances are that you will also pass the CodeGrade tests.

The first 6 functions to implement

The first 6 functions, which are worth 70% of the marks, are broken down into two groups. The percentages in square brackets show the breakdown of the marks by function.

Order book stats:
1. def total_volumes(book) [5%]
2. def best_prices(book) [5%]
3. def midprice(book) [5%]
4. def spread(book) [5%]

These first 4 functions are intentionally very easy, and are meant to get you used to the format of the book. The next 2 functions are more involved and relate to reconstructing the order book from an initial book and a file of messages.

Reconstructing the limit order book:
5. def reduce(book, message) [15%]
6. def add(book, message) [35%]
Running main.py
An example of calling main.py is as follows.
python main.py input/book_1.csv input/empty.txt

As seen in this example, main.py takes as arguments the path to two input files (the order of the arguments matters):

1. an initial order book (inputbook_1.csv in the example)
2. order messages to be processed (input/empty.txt in the example)
Let’s see the source code of main.py and the output that it produces.
import argparse
import os
import sys
from common import *
# Suppress warnings
import warnings
warnings.filterwarnings("ignore")
# Define argument format
arg_format = "<--log> <book_path> <messages_path>"
# Argument parsing
parser = argparse.ArgumentParser(description="Main script", add_help=False)
parser.add_argument("--log", action="store_true", help="Enable logging")
parser.add_argument("book_path", type=str, help="Path to the book file")
parser.add_argument("messages_path", type=str, help="Path to the messages file")
# Parse arguments
args = parser.parse_args()
nargs = len(sys.argv) - 1
if nargs < 2 or nargs > 3:
# Check that there are 2 or 3 arguments
raise ValueError(f"main.py has 2 required arguments and 1 optional flag: {
arg_format}")
if nargs == 3 and not args.log:
# If 3, check that --log is the first
raise ValueError(f"Bad arguments format, expected: {arg_format}")
book_path = args.book_path
messages_path = args.messages_path
# Check if all files exist
if not all(os.path.exists(path) for path in [book_path, messages_path]):
raise FileNotFoundError("File does not exist at path provided.")
# Main code
book = load(book_path)
book = reconstruct(load_data(messages_path), init=book, log=args.log)
summarise(book)
In short, main.py:
• checks that the command line arguments are ok
• assigns them to variables (book_path, and message_path respectively)
• import common.py; loads the initial book
• reconstructs the book according to the messages
• prints out the resulting book; prints out the book stats
Let’s see what the output would look like for a correct implementation:
$python main.py input/book_1.csv input/empty.txt
ask book:
oid price size
0 a 105.0 100
bid book:
oid price size
0 b 95.0 100
--Stats--
Total volume: 100 100
Best prices: 105.0 95.0
Mid-price: 100.0
Spread: 10.0
You will see that the order book stats once you have implemented the first 4 functions in template.py.
The initial order book
Here is the contents of input/book_1.csv, one of 3 provided initial book examples:
oid,side,price,size
a,S,105,100
b,B,95,100
Let’s justify the columns to help parse this input:
oid side price size
a S 105 100
b B 95 100
The first row is a header row. Every subsequent row contains a limit order, which is described by the following fields:
• oid (order id) is stored in the book and used to process (partial) cancellations of orders that arise in “reduce” messages, described below;
• side identifies whether this is a bid (’B’ for buy) or an ask (’S’ for sell);
• price and size are self-explanatory.

Existing code in common.py will read in a file like input/book_1.csv and create the cor responding two (possibly empty) orders books as two data frames that will be stored in the dictionary book, a version of which will be passed to all of the functions that you are required to implement.

If correct message parsing and book updating is implemented, book would be updated
according to input/message_a.txt to give the following output:
$python main.py input/book_1.csv input/message_a.txt
ask book:
oid price size
7 a 105.0 100
6 o 104.0 292
5 r 102.0 194
4 k 99.0 71
3 q 98.0 166
2 m 98.0 88
1 j 97.0 132
0 n 96.0 375
bid book:
oid price size
0 b 95.0 100
1 l 95.0 29
2 p 94.0 87
3 s 91.0 102
--Stats--
Total volume: 1418 318
Best prices: 96.0 95.0
Mid-price: 95.5
Spread: 1.0
Before we go into details on the message format and reconstructing the order book, let’s discuss the first four functions that compute the book stats.
Computing limit order book stats

The first four of the functions that you need to implement compute limit order book stats, and can be developed and tested without parsing the order messages at all. In particular, you can develop and test the first four functions using an empty message file, input/empty.txt, as in the first example above.

The return values of the four functions should be as follows:
• total_volumes should return a dictionary with two key-value pairs, with key bid and its value the total volume in the bid book, and key ask and its value the total volume in the ask book;
• best_prices should return a dictionary with two key-value pairs, with key bid and its value the best bid price, and key ask and its value the best ask price;
• midprice should return the midprice of the book;
• spread should return the spread of the book.
Check that the example outputs above are what you expect them to be.

Reconstructing the order book from messages

We now move on to reconstructing the order book from the messages in the input message file. You do not need to look into the details of the (fully implemented) functions reconstruct or handle in common.py that manage the reconstruction of the book from the starting initial book according to the messages (but you can if you want).

In the next section, we describe the two types of message, “Add” messages and “Reduce” messages. All you need to know to complete the assignment is that messages in the input file are processed in order, i.e., line by line, with “Add” messages passed to add and “Reduce” messages passed to reduce, along with the current book in both cases.

Message Format

The message file contains one message per line (terminated by a single linefeed character, ’\n’), and each message is a series of fields separated by spaces. Here’s an example, which contains an “Add” message followed by a “Reduce” message:

A c S 97 36
R a 50
An “Add” message looks like this:
'A' oid side price size
• ‘A’: fixed string identifying this as an “Add” message;
• oid: “order id” used by subsequent “Reduce” messages;
• side: ‘B’ for a buy order (a bid), and an ‘S’ for a sell order (an ask);
• price: limit price of this order;
• size: size of this order.
A “Reduce” message looks like this:
'R' oid size
• ‘R’: fixed string identifying this as a “Reduce” message;
• oid: “order id” identifies the order to be reduced;
• size: amount by which to reduce the size of the order (not its new size); if size is equal toor greater than the existing size, the order is removed from the book.
Processing messages
A “Reduce” message affects at most one existing order. An “Add” message will either:

• not cross the spread and then add a single row to the book (orders at the same price are stored separately to preserve their distinct “oid”s);

• cross the spread and in that case can affect any number of orders on the other side of the book (and may or may not result in a remaining limit order for residual volume).

The provided example message files are split into cases that include crosses and those that don’t. This allows you to develop your code incrementally and test it on inputs of differing difficulty. We work through an example of each case, one by one. In each example we start from input/book_1.csv; we only show this initial book in the first case.

Example of processing a reduce message
$python main.py input/book_1.csv input/empty.txt
ask book:
oid price size
0 a 105.0 100
bid book:
oid price size
0 b 95.0 100
--Stats--
Total volume: 100 100
Best prices: 105.0 95.0
Mid-price: 100.0
Spread: 10.0
R a 50
$python main.py input/book_1.csv input/message_ex_reduce.txt
ask book:
oid price size
0 a 105.0 50
bid book:
oid price size
0 b 95.0 100
--Stats--
Total volume: 50 100
Best prices: 105.0 95.0
Mid-price: 100.0
Spread: 10.0
Example of processing an add (non-crossing) message
A c S 97 36
$python main.py input/book_1.csv input/message_ex_add.txt
ask book:
oid price size
1 a 105.0 100
0 c 97.0 36
bid book:
oid price size
0 b 95.0 100
--Stats--
Total volume: 136 100
Best prices: 97.0 95.0
Mid-price: 96.0
Spread: 2.0
© 2026 The University of Liverpool. Unauthorised publication of this document is prohibited. 9Example of processing a crossing add message
A c B 106 101
$python main.py input/book_1.csv input/message_ex_cross.txt
ask book:
Empty DataFrame
Columns: [oid, price, size]
Index: []
bid book:
oid price size
0 c 106.0 1
1 b 95.0 100
--Stats--
Total volume: 0 101
Best prices: None 106.0
Mid-price: None
Spread: None
The None in the last example are the expected output for certain fields when at least one side of the book is empty and the corresponding quantity is undefined.
Sample output

We provide sample output for 9 cases, namely all combinations of the 3 initial books (book_1.csv, book_2.csv, book_3.csv) and 3 message files, all found in the input subdirectory. The 3 message files are called:

file

messages_a.txt
add messages only, i.e., requires add but not reduce; for all three initial books, none of the messages cross the spread
messages_ar.txt
add and reduce messages, but for the initial book book_3.csv, no add message crosses the spread
messages_arc.txt
add and reduce messages, with some adds that cross the spread for all three initial books

The 9 output files can be found in the output subdirectory of the comp226_a1 directory, shown as below.

Hints for order book stats
For spread and midprice a nice implementation would use best_prices. Hints for add and reduce

A possible way to implement add and reduce that makes use of the different example message files is the following:

• First, do a partial implementation of add, namely implement add messages that do not cross. Check your implementation with message_a.txt.
• Next, implement reduce fully. Check your combined (partial) implementation of add and reduce with message_ar.txt and book_3.csv (only this combination with message_ar.txt has no crosses).
• Finally, complete the implementation of add to deal with crosses. Check your implementa tion with message_arc.txt and any initial book or with message_ar.txt and book_1.csv or book_2.csv.
Explanation on sort
A j S 105 132
A k B 95 71
Note that earlier messages are closer to the top of the book. This is due to price-time precedence, according to which orders are executed:

• Best price first, but when two orders have the same price, the earlier one is executed first

We provide sort that respects price-time precedence. It relies on the fact that the order ids increase as follows:

a < k < ab < ba
where < is indicating “comes before” in the message files.
def sort(book, sort_bid=True, sort_ask=True):
if sort_ask and len(book['ask']) >= 1:
book['ask'] = book['ask'].sort_values(by=['price', 'oid'],
key=lambda col: col.map(
lambda x: (len(x), x)) if col.name == 'oid' else col,
ascending=[True, True]).reset_index(drop=True)
if len(book['ask']) > 0:
book['ask'] = book['ask'].iloc[::-1]
if sort_bid and len(book['bid']) >= 1:
book['bid'] = book['bid'].sort_values(by=['price', 'oid'],
key=lambda col: col.map(
lambda x: (len(x), x)) if col.name == 'oid' else col,
ascending=[False, True]).reset_index(drop=True)
return book

This method will ensure that limit orders are sorted first by price and second by time of arrival (so that for two orders at the same price, the older one is nearer the top of the book). You are encouraged to use sort in your own implementations. In particular, by using it you can avoid having to find exactly where to place an order in the book.

Be very careful when you index the bid or ask book after the book is sorted. You need to check whether index 0 or -1 points to the first row or the last row of the dataframe.

Hint on using logging in reconstruct

In common.py a logging option has been added to reconstruct:
def reconstruct(data, init=None, log=False):
if init is None:
init = book_init()
if len(data) == 0:
return init
def reducer(b, i):
new_book = handle(b, data.iloc[i])
if log:
print(f"Step {i}\n\n")
summarise(new_book, with_stats=False)
print("====================\n\n")
return new_book
book = init
for i in range(len(data)):
# if len(book['ask']) > 0:
#
book['ask'] = book['ask'].iloc[::-1]
book = reducer(book, i)
return sort(book)

If you want to use this for debugging, you can turn it on with the –log flag, e.g.:

python main.py --log input/book_1.csv input/message_arc.txt

Then summarise(book) is used to give intermediate output after each message is processed.

Hint on debugging
You can write your own code for debugging without using terminal/command prompt. For example, you can write a script to load the books using paths to the books and then reconstruct and summarise the book. With this script, you can run the script within your IDE.
Extra problems for final 30%
Warning
The final 30% of marks are only available if CodeGrade gives you the full first 70% of marks. Please only focus on the extra problems once you have achieved this.

For these final 30%, there are four functions that you are asked to implement. You can get marks for any one of these independently. The marks available are as follows:

1. def extra1(book, size) [6%]
2. def extra2(book, size) [6%]
3. def extra3(book, size) [6%]
4. def extra4(book, size) [12%]

The functions are defined by a full specification, given below, but we intentionally do not give you explicit test cases (you need to create them for yourself if you want) and you cannot find out your mark before the deadline. We will also not offer detailed help on solving these parts, as they are meant to be more challenging, with part of that challenge being to solve them on your own.

If you have achieved the full first 70%, then you can get extra marks for fully or partially correct implementations of any of the 4 extra functions, independent of each other (e.g., you can complete extra3 and not the other extra functions and still get marks).

The first three require you to compute an expectation of a discrete random variable. If you need a refresher, go back to COMP111 Introduction to Artificial Intelligence. For our application such an expectation is just the average (since the probability distribution is uniform) over all the possible values of the random variable.

Assumption

For the extra tests, you can assume that bid book is not empty, so that the starting mid-price is only None if the ask book has no orders in it.

Extra problem 1
extra1 has two parameters, book (the order book), and size, which is an integer size between 1 and M, where M is the total volume in the ask book (you can assume that your function will only be tested on such values of size).

The function should return the expected value of the midprice after execution of a buy limit order with size size and price drawn uniformly at random from the set of prices in ask.

Extra problem 2

extra2 has exactly the same two parameters as extra1. The function should return the expected value of the midprice after execution of a buy limit order with size size and price drawn uniformly at random from the integer prices (the tick size is 1) between the best ask price and the highest ask price in the book. If the best ask price or the highest ask price are not integers, you need to start from the integer right below the best ask to the integer above the highest ask.

Hint: For the first 2 extra problems, if size is equal to M then the correct answer is None.

Extra problem 3

extra3 only takes book as an argument. The function should return the expected value of the midprice after execution of a buy market order with size s, assuming that s is drawn uniformly at random from the set {1,....,(M-1)}, where M is the total volume in the ask book.

Hint: For the first 3 extra problems, one unified approach is to simulate the relevant orders using functions that you implemented for reconstruct.

Extra problem 4

extra4 has two parameters, book (the order book), and k a non-negative number that will be interpreted as a percentage, e.g., if k=0.4 then k corresponds to 0.4%.

The function should return: 0 if the ask book has no orders in it; otherwise the largest amount of buy volume v such that a buy market order with size v causes the mid-price to increase by no more than k % (so an order with size v+1 would either cause the midprice to increase by more than k % or would leave no asks left in the book which means that the mid-price is None).

Note: the return value should be an integer between 0 and the total ask volume (exclusive) in book.

Hint: For all 4 extra problems, to increase the chance that your implementations are correct, do one or two examples where you compute the correct expectation by hand once and then check that your code produces the correct answer.

THE END