Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


COMP1006

LAB EXERCISE #4


These exercises are designed to be straightforward and are primarily to get you used to using Komodo on the school’s Linux system and writing ARM assembly language. These exercises are all built around accessing memory as seen in the online lecture engagement on the 8th November (with further coverage on the 15th November).


Getting Started

You will need to fork and clone the relevant project via git in the usual fashion. In this case, the project to fork can be found at: https://projects.cs.nott.ac.uk/2021-COMP1006/2021-COMP1006-LabExercise04

Once cloned, you will find skeleton .s files for each exercise. It is strongly recommended that you start from these skeletons otherwise your programs may not work against the marking test scripts.

Note: For all these programs you may well find it useful to start by generating a C version (or any other programming language you are familiar with) first to ensure the logic of the program is sorted before you start transliterating it into ARM assembler.

There are two exercises this week, one which implements a heavily simplified version of C’s printf(), and the second is an improved version of the text formatter Full detail of the exercises can be found overleaf.


Assessment Notes

The exercises are starting to get harder this week, and the marking will reflect this. Therefore, you should not necessarily expect to obtain full marks for each exercise — remember the University marking scheme is such that any mark over 40% is a pass mark, and marks over 70% are considered first class. As an example, for both exercises this week, some of the marks are only available if you complete the advanced variant of the exercise.


Exercise 1 — ‘printf’

For this exercise, you are to write an ARM assembly program that implements a heavily simplified version of the printf function found in the C standard library. Your program will take as input (in specific registers, your program must use the registers listed for input), the address of a format string to print out (in R1), and the address of a sequence of values (in R2) to interpolate into it.

As with C’s printf(), the format string is composed of zero or more directives: ordinary characters (not ‘%’), which are printed out unchanged; and conversions specifications, which begin with a ‘%’ character. For each conversion specifier encountered in the string, rather than print out the conversion specifier (%s, %d, etc.), your program should instead printout a value of the type specified (string, integer, and so on). These values are specified as a sequence of 32-bit words in memory, with the address of the first value being given to your program in the register, R2.

Some hints for completing this exercise:

The address of the format string is specified in R1, the address of the values is specified in R2 — if you use other registers, you program will fail in the pipeline. Likewise, you must not add in code before the label printf in the skeleton file, 01-printf.s, or your program will not execute correctly. You may however (and probably should) modify the format strings and sequence of values to ensure your program works correctly with different input.

Your program will need to loop through every character of the format string, and print out the characters (using SWI 0), unless the character is a percent (‘%’) character

Your program should stop looping when it reaches a null character in the format string, (i.e. when the byte read from memory has the literal value of zero (not the ASCII code for the character representing the digit zero). You can find this by comparing with #0.

The format string is made up of characters that are one byte long, you will need to use LDRB to access these. The values are all 32-bits (four bytes) long, you will need to use LDR to access these.

If your program encounters a conversion specifier, (i.e. you read a ‘%’ character from the format string), then your program should fetch the next byte of the format string from memory. Then based on the value of this character, your program should:

If the conversion specifier is %%, then your program should print out a single percent character. The value of R2 should not be updated.

If the conversion specifier is %c, then your program should read the 32-bit value from the address currently specified in R2. Your program should treat the value read as an ASCII character and print it out using SWI 0. The value of R2 should be updated to contain the address of the next value in the sequence.

If the conversion specifier is %d, then your program should read the 32-bit value from the address currently specified in R2. Your program should treat the value read as an unsigned integer value and print it out using SWI 4. The value of R2 should be updated to contain the address of the next value in the sequence.

If the conversion specifier is %s, then your program should read the 32-bit value from the address currently specified in R2. Your program should treat the value read as containing the address of a null-terminated string and print it out using SWI 3. The value of R2 should be updated to contain the address of the next value in the sequence.

Invalid conversion specifiers (i.e. any other character following the %) should print nothing out

Bear in mind that each value in the sequence of values is 4 bytes long.

It can be instructive to consider this program as being formed from two parts. One part of the program is stepping through each character of the format string deciding what needs to be printed (the character, or a value — and for each value, what type of value). The other part of the program is fetching the value from the sequence of values and printing it out based on the type specified in the string. I’d suggest working on the two parts of the program separately, first get the program stepping through the format string printing out dummy values, then when you are confident that part is working, move onto writing the code to read the actual values from memory.

A skeleton file, 01-printf.s, is provided. You must use this file for your code, and you should not modify the provided portions of the code (other than to check your program works with differing input values). If you modify these parts of the file, your program may no longer work correctly when assessed by the pipeline.

[5 marks]


Exercise 2 — Improved Text printing

In the previous lab exercise, you implemented a simple text formatting program that would attempt to keep the lines of text below a certain number of characters wide. However, depending on whether you implemented the basic or advanced version, your program would either break words in half when it reached the specified line length, or print a line slightly longer than the line length since it waited until the next space before breaking the line.

Now that we have looked at how to access memory in a bit more depth we can implement a better version of this program that will always keep the lines below the specified length, unless there is a word longer than the specified length (in which case, we have no option but to print the word in its entirety). To do this, rather than immediately printing the characters out as they are read, we will store the characters into memory first and then when we encounter a space character decide whether to print it on the current line, or to print it on a new line.

As before, the file 02-breaker.s contains a simple program that repeatedly reads in a character (using SWI 1) and then prints it out (using SWI 0) until the hash character (‘#’) is typed, at which point the program quits. You should modify this skeleton so the program will break the text entered into lines that are not longer width characters wide, unless the word is longer than width characters.

Some hints for completing this exercise:

You will need to read the value of width using LDR. Your program should work for any positive integer value of width greater than zero.

Your program will need to read characters (using SWI 1), and then store them in memory until a space character (ASCII code 32) is detected. You can use the memory labelled buffer for this. Since the characters are all single bytes long, then you should use STRB to store the value in memory. Of course, you’ll almost certainly will need to store words greater than one character long, so you will need to keep the address of the buffer in a register and increment it after each character.

Do not worry about the numbers already in the buffer, just overwrite them with your characters (defining a string like this is a quick and easy way to reserve sufficient space, the numbers making it easy to count how much space is available).

Once a complete word has been entered (the end of a word is signified by a space), then you need to decided whether to print it out on the current line or on the next line. The stored word should only be printed out if the number of characters already printed on the line (including spaces) plus the length of the stored word and the length of the space (1) before the word is less than or equal to width. Otherwise, the stored word should be printed on a newline.

You will need to keep count of the number of characters that have already been printed on the current line (including spaces), and the length of the stored word.

SWI 3 can print out strings by giving it the address of the start of the string in R0, but the string must be terminated by a null character at the end (i.e. 0, not the digit ‘0’).

You can print a newline character using SWI 0 (set R0 to have 10 in it), but remember you also need to print the character that was read in with SWI 1 (SWI 1 also puts the character read into R0).

Only a single space should be printed between words, if multiple spaces are entered between words only a single space should be output.

The user may press the RETURN key when typing, this should still cause the text to start again from the beginning of a new line — remember you can have up to width characters on a line. If the RETURN key is pressed, then SWI 1 will return 10 in R0.

When the end of file character is reached, any partial word currently stored should be printed.

Think and test what happens in the corner cases

[5 marks]