Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data File Lab – Assignment 1

COMP2100

Revised 14 August 2023

Richard Han, [email protected]

This lab is the first of two assignments in COMP2100.

Commences: Week 1.

Progress: 11:55pm, Fridays of weeks 4, 5, 6, 7: 18 Aug, 25 Aug, 1 Sep, 8 Sep

Due: 11:55pm, Friday of 1st  week of mid-semester break, 15 September 2023

Value: 20% (13% for task, 4% weekly progress, 3% for code explanation)

1.  Introduction

This is the first of two programming task assignments in COMP2100.  In this assignment, you will work with binary data stored in files, using the Unix I/O interface to read and write such files.

The first part of this document describes the assignment (Data File Lab) in detail.  Individual sections discuss an overview of the lab, how to fetch your personalised assignment specification, how you submit your work and obtain automatic feedback while you are working, and how the marking scheme works.

The second part of this document describes each stage of the lab assignment.  You should read this part to understand your task in each lab stage, and you should also refer to the personalised assignment specification for each stage.  You obtain your personalised specification using the lab command.  You should ensure that your personalised specification is kept secure at all times.

You should also read the relevant support documents on iLearn in the section “Programming Task     Support Notes”.  The Lab Command Manual will enable you to perform key tasks including checking your marks and applying for free extensions – it is relevant to both assignments in this unit.  There    are documents that guide you with programming style, and support notes on C programming including some key ideas that are relevant to this Data File Lab.

This document is Copyright © 2023 by Macquarie University.  Neither this document nor any part of the assignment may be communicated to any unauthorised person or through any unauthorised service or website.

2.  Overview of the Lab

Data files exist in various formats.  In Unix, text files are common for simple data, but large data files arestored as binary data.  In this lab assignment (programming task), you will be developing programs to read, write, modify, and reformat the data in binary data files.  The assignment consists of a sequence of stages which build on each other.  The first three stages develop your skills.  In the  final stage you will reverse engineer a data file using your knowledge of data representations.  You    have the choice between an easier final stage (stage 4) that can earn you at most 3 marks for the stage,or a more difficult final stage (stage 5) worth up to 4 marks for the stage.  You may attempt both stages 4 and 5, but only the maximum of the two marks will count towards your total.

The marking outline is:

Section

Value

Stage 1

3

Stage 2

3

Stage 3

3

Stage 4 (max 3 marks) or stage 5 (max 4 marks)

4

Progress

4

Code

comprehension/explanation

3

Total

20

Learning Outcomes

This assignment will involve you in developing the following specific skills and capabilities.

•    Able to write programs that use C data structures, pointers and arrays.

•    Able to read and write binary data files, and write data in text format.

•    Able to convert between different data representations.

•    Able to use malloc and free to construct data structures using the heap.

•    Able to implement simple command line parameters.

•    Able to interpret and recognise binary data representations.

•    Research Unix library and system calls

3.  Fetching your lab assignment

The lab assignment files are accessed through the lab command which can be found at

/home/unit/group/comp2100/lab

There is no Unix man page for the lab command (it is not a Unix system command) but there is documentation on iLearn and if you don’t give it any command-line parameters or options then it will printout some brief documentation itself  (a similar feature is common in many Unix programs). To see how this works, try the following command (where the $ symbol represents the Unix command-line prompt – you should type the command that is underlined in this example).

$ /home/unit/group/comp2100/lab

The option –g is used to get a lab stage.  For example, to get lab 1 stage 1, do:

$ /home/unit/group/comp2100/lab –g 1.1

For stage 2, the option would be –g 1.2 instead.  Please see the Lab Command Manual in iLearn    for more information about the lab command, including options for submitting assignments, getting marking reports, checking due dates and claiming your free extension days.  Also, you can set up your Unix account so that you can abbreviate the command and just type “lab” instead of the full  path name “ /home/unit/group/comp2100/lab”.  For the rest of this document, we will use the abbreviated name.

The lab get command downloads your lab data as atar file.  For stage 1, the tar file is stage1.tar.

Tar is an archive utility (like zip) – it stores many files packed into one file.  Use tar to extract the contents of this file.  You can read all about tar in the Unix man page

$ man 1 tar

Here is the command to extract the contents of stage1.tar.

$ tar xvf stage1.tar

This will create a directory called stage1 and put the downloaded files in that directory.

4.  Feedback during the assignment, submission and marking

In this assignment,you can submit your code as often as you like, and receive immediate feedback    and marks.  There are rewards (progress marks) for working consistently throughout the assignment period and achieving stages of work by their due dates.  Most of your final mark will be computed from the results of the automarker.

The maximum mark for the assignment is 20 marks.  Of those marks, 13 marks are for achievements   in the various stages, 4 marks are for progress and 3 marks are for code comprehension/explanation.

A.         Introduction to COMP2100 automarking

This is the first of two lab assignments in COMP2100.  In both assignments, an automarker will track your progress and provide feedback to you.  This is more than just telling you your mark – it is a feedback mechanism designed to help you as you work through the assignment.  Firstly, the feedback is immediate, so when you think you have solved a problem you can submit your revised solution and see immediately whether it has enabled you to pass the automarker tests.  Secondly, the automarker provides a detailed breakdown of your mark, which can help you isolate specific problems (such as a memory leak when using malloc and free).  Thirdly, the automarker sometimes  provides specific hints to help you understand what you need to address – such as identifying which columns of your data file are incorrect.

You cannot rely only on the automarker, however.  In this assignment, you will be provided with test data files, and you can and should compile your program yourself and run it on the test data files, examining the output yourself and identifying errors in your code.  The automarker does not replace good old-fashioned debugging – one of the essential skills for all programmers.

B.         Individual work and information resources

The stages of the lab are based on data files that will be provided to you.  Each student will have their own specific data files to work with, with their own unique data format.

This lab must be your own work.  However, you may use resources on the Internet to obtain general information including information about the C language and libraries, information about binary and  textdata formats, and information about the operating system.  If you obtain useful information from the Internet, you must include comments at the relevant points in your code acknowledging the source of the information (URL) and briefly describing the key idea(s) that you are using.

(Exception: information from the Unix manual pages does not require citation in your program).

The Unix manual pages are available online on ashand iceberg – use the man command.  You can also find Unix manual pages online through Google.  For example, to find out about the printf library call, use the command “man 3 printf” or Google “man printf” and to find out about the directory listing command ‘ ls’, use the command man ls” or Google “man ls”.  However, you should be cautious about using information found online because sometimes there are differences between different Unix systems and our systems may not behave exactly the same as described in some online documentation.

The manual pages on the system (man command) are divided into sections:

1.    System commands such as ls, wc, etc.

2.    Unix system calls such as read(), open(), etc.

3.    Unix library such asprintf(), fopen(), etc.

4.    Sections 4-8 contain other information.

For more information on the man command, use the command “man man” to read the manual pages about the man command.

C.         Submitting your labsolution - achievement marks [13 marks]

Your labsolution can be submitted using the lab command.  The option –s is used to submit a solution to a lab stage.  After the option, list all the files that you want to submit.  Each time you submit, it is treated as afresh submission, so you must list all the files that you want to submit every time.  (If you find that tedious, learn about wildcards in the bash shell.)  For example:

$ lab –s 1.1 stage1.c sub.c defs.h

The lab utility sends your submitted files to a server which compiles the C files together into a program, runs it, and tests that it works correctly for your particular lab assignment.  The server records information about your submission and also sends back information to you through the lab command.

You can submit as many times as you like.  As a matter of personal achievement, you should aim to achieve a really good score on your initial submit, having checked that your program compiles without errors and performs correctly on the provided sample data files.  However, if there are problems identified by the auto marker, you can resubmit without penalty.

You must download each stage before you attempt to submit a solution to that stage.  Further, you need to download each stage because the download provides you with the input and output data    files that you need in order to test your program yourself.

The marks awarded by the automarker for each stage of the assignment are called the achievement marks for that stage.

D.         Progress marks [4 marks]

Each lab assignment includes marks that are awarded for progress on the task at specified dates (approximately each week).  The lab assignments are to be done both during lab sessions (with the assistance of lab supervisors) and in your own time.  Each week that the lab is out, you earn a progress mark if you achieve the specified milestone by 11:55 pm on the specified date.  Each milestone is achieving a mark of at least 2.0/3.0 in a specific assignment stage. You can earn the progress marks early, but you cannot earn them late.

If you do not achieve the milestone for a progress mark by the specified date then you lose that week’s progress mark and the milestone “slips” and becomes due on the next progress date.  All the later milestones also slip to the next due date, but the last milestone is lost.  If you achieve the slipped milestone by the new progress date then you receive the progress mark for that date, but you have lost the progress mark for the missed date and you cannot make it up later.

In the lab command, progress marks are identified by the date that they are due.

The Milestones

•    Friday of Week 4: Stage 1 achievement mark of at least 2.0/3.0

•    Friday of Week 5: Stage 2 achievement mark of at least 2.0/3.0

•    Friday of Week 6: Stage 3 achievement mark of at least 2.0/3.0

•    Friday of Week 7: Stage 4 or 5 achievement mark of at least 2.0/3.0 or 2.0/4.0

•    Friday of 1st  week of mid-semester break (15 September): Lab closes

Example

Sarah earns 2.0/3.0 for stage 1 of the assignment on the Monday of the week of the Stage 1 milestone.  She receives the first progress mark.  Her work onstage 2 is delayed, and she has not achieved 2.0/3.0 for stage 2 of the assignment when the second progress milestone falls due, so she  misses that progress mark.  She achieved 2.0/3.0 in stage 2 later that same week and is awarded the  third progress mark because that is the next progress milestone date.  When she achieves 2.0/3.0 in  stage 3 later on, she is awarded the final progress mark.  She cannot be awarded a progress mark for achieving 2.0/3.0 in stage 4 because she missed the earlier progress mark.

E.         Code Explanation [3 marks]

During the semester, your tutor will provide an opportunity for you to demonstrate that you can explain the code you have written.  You will show the tutor your program for a particular stage, and  answer some questions about it.  Your mark for this task will be recorded in iLearn as 0, 1 or 2 out of 2, but will be scaled to be worth 3% of your final grade.  You will receive full marks (2 out of 2) for clearly demonstrating that you can explain how your code works and why you wrote it as you did. You will receive half marks (1 out of 2) if your explanations demonstrate only a limited understanding of your code.  You will receive a mark of zero if you are unable to explain your code.

5.  Detailed information about marking

The lab command computes your marks and records them on the server.  Normally, the mark recorded at the end of the assignment will be your final mark for the achievement and progress parts of the assignment.  Once the assignment has closed for all students, the automarker marks can also be uploaded to ilearn.  All marks are computed to 1 decimal place as displayed in the marking reports that you receive from the lab command.

A.         Detailed marking guides for each stage

When you download and extract the files for a stage you will find a file called marking-guide.txt in the extracted files.  This text file contains a detailed marking rubric for the stage.

The auto marker uses this rubric to mark your submission for the stage. The marking guide includes detailed notes that describe how each mark is calculated and what is being marked.  In particular,     the marking guide will tell you whether each item is marked proportionally, by error count, or as a    Boolean (see “Types of Achievement Marks”, below).

In later stages,some auto marker checks are thresholds.  Threshold conditions may not contribute marks to your total, but are required for your program to be eligible to earn other marks.  The marking report will display if any threshold has failed, and it will indicate which marks are suppressed due to the failed threshold.  Thresholds and marks that require thresholds are indicated  in the marking guide marking-guide.txt.  The idea behind threshold marks is that you need to have a program which meets the basic requirements before awarding you marks for more sophisticated behaviour of your program.

The marking-guide.txt file is generated by the server from configuration information that is part of the automarking process.  The marking guide itself is the same for all students.  However,    generating it in the server and delivering it to you in this way ensures that the marking guide is consistent with the server’s marking system.

B.         Types of Achievement Marks

There are three types of achievement marks, as explained in the marking-guide.txt files.

•    Ordinary marks are proportional, computed as a percentage and scaled to the maximum

mark. For example, there is a mark awarded for the correctness of your output file, that is

computed from the proportion of correct rows and the proportion of correct columns in the output file.  After scaling according to the maximum mark, the mark is rounded down to a

multiple of 0.1.  For example, if the percentage mark is 98% and the mark is scaled to a

maximum of 1.0, then the rounded mark would be 0.9 (not 1.0).  Rounding down ensures    that full marks are only awarded for perfect scores of 100% on the particular marking item.

•    Error count marks deduct a fixed amount (usually 0.1 or 0.15) from the maximum mark for  each error that is counted, until the mark reaches 0.0.  Error count marks are typically used  for error checking such as checking your structure definition – a fixed amount is deducted    for each error found in the definition, and the automarker gives you are hint identifying the errors.

•    Boolean marks are used for test conditions which are either successor failure.  The mark is     awarded either as the full mark or as 0.0.  The full mark is awarded when the test condition    is satisfied, and 0.0 is awarded when the test fails.  Boolean marks typically have small values (such as 0.1 or 0.2) and are awarded for specific tests such as ensuring that your program

exits without an error status in normal operation, or that there are no memory leaks.

C.         Maximising Your Mark

Here are some hints to get the most marks in this assignment.

1.    Work on this assignment every week until the deadline.  Don’twait until you’ve finished the assignments for your other units before you start this assignment.  This assignment is

intended to be worked on over a period of 5 weeks and almost certainly cannot be completed in a few days.

2.    Achieve at least 2.0 marks in each stage of this assignment by the progress mark deadline.   Progress marks reward you for consistently working on the assignment.  You show that you are working consistently by achieving a mark of at least 2.0/3.0 for the next stage of the

assignment each week.

3.    Start thinking about the next stage, and start working on it, once you have a reasonably

good mark (at least 2.0) for the earlier stages.  You may have an obscure bug that costs you 0.1 or 0.2 marks in the current stage, but you can earn more marks by working on the next  stage than by spending all your time trying to perfect your current stage score.

4.    Do your own testing as well as using the hints provided by the automarker. The automarker can give you a general idea of your problems, but running your program yourself allows you  to examine the particular mistakes that you are making.

The Stages of Assignment 1

Stage 1: Initialisinga C struct and printing it out as text [3 marks]

In this stage you will declare a C data structure, create an instance of it and statically initialise it

(declare it as a static or global variable and initialise it in one statement using braces) .  You will then printout the instance.  This stage develops the following specific skills:

•    Declaring a C struct.

•    Initialisinga C struct

•    Printing various data types using printf

Note: Do not use bit fields in your struct.  All the data types that are specified correspond to ordinary C data types.

Note: The automarker checks your struct definition against expected ways of writing it and

awards marks for correctness.  Field names must be exactly correct.  Types should be the common C language data types as defined in ANSI C.

Resources

The following documents on iLearn may be helpful:

Compile, Run, Make C Programs on Linux

C Programming Notes for Data File Lab

Decimal, Binary, Octal and Hex

C language features you may need to know:

•    struct

•    C data types for integers and floating point including

o different sizes of integers

o signed and unsigned integers

o char

•    printf call, and printf format specifiers

•    Writing integer constants in decimal, octal and hexadecimal

Other topics that you may need to understand:

•    Binary representation of integers

•    2’s complement representation of negative numbers

•    Conversion between decimal, binary, octal and hexadecimal

Your downloaded stage1.tar file contains the following files:

•    filestruct-description.txt: A simple description of the fields that are in your struct – their names and type description.

•    initialisation-specification.txt: Specifies the initial value for each field of

your struct.  The initial value has to be formatted in a specific way in your source code – this may mean that you have to convert one representation to another.  See the lab note

Decimal, Binary, Octal and Hex.  Note: It makes no difference to the data that is stored inside the computer whether you initialise the field with decimal or the equivalent hexadecimal or octal.  However, as an exercise, we require you to make the appropriate type conversions and the automarker will check your code.

•    expected-output.txt: Stage 1 expected output file.  Use the example in this file to workout what formatting options to use in printf.

Useful Unix commands

You might find the following Unix system commands helpful.

cat

•    diff

Task

Write a C program that declares your particular data structure as described in the C structure

description file. Statically1 initialisean instance of the data structure to the initial values as specified in the file – use the data formats as specified in the file such as hexadecimal, decimal or octal

constants.  In the main program, printout the data structure using printf formatting to make it exactly match the provided sample output file.  Note that you may need to use various formatting options with printf to control the appearance of the output.  You are expected to read about     printf and workout how to format the data so that it exactly matches the expected output.

Submit your program for automatic assessment using the lab command.

Stage 2: Reading a binary data file and printing it out [3 marks]

In this stage you will read a binary data file in a known format, storing the information into instances of a C data structure which you will then printout.  This stage develops the following specific skills:

•    Reading binary data

•    Opening and closing files

•    Printing various data types using printf.

•    Using asimple command-line parameter.

Resources

•    filestruct-description.txt: Describes the members of the C data struct which correspond to fields in the records of the data file.

•    input-*.bin: Sample binary input files.

•    output-*.txt: Sample text output files corresponding to the input files.

C language features you may need to know:

•    fopen system library call to open a file, and a related call to close the file

•    fread system library call to read binary data from a file

•    sizeof operator in C

•    The parameters of main() and how to access the command line parameters

Useful Unix commands

You might find the following Unix system commands helpful.

moreor less

•    diff

•    od

Task

Write a C program that reads a file of binary data records as described in the structure description     file.  The program will obtain the filename as a command line parameter (see below).  The program  will read and print all the records in a binary data file where each record has the format described in filestruct-description.txt.  You already developed code to printout a single record in     stage 1, so the focus of this stage is reading a binary data file into memory.

The output formatting requirements for this stage are the same as in stage 1.  However, it is possible that you may need to modify your record printing code – it could be that your printfcall worked  correctly for the single initialised record in stage 1 but it may not be correct for all the data records    in the files.  You should check the output against the expected output using diff, and improve your printf statement in whatever way is needed to get the correct output.

Your program must accept one command-line parameter which is the name of the input file.  The

automarker will run your program many times, each time with a different input file name as the

parameter, and it will compare the output of each run with the expected output.  You should do the same thing for your own testing.

The fields of the records arestored using the types specified in the data file description.  The fields

arestored packed next to each other in the data file.  You cannot read the entire record directly into a C struct in one call because C inserts additional unused space between some of the fields in the

struct (this is called alignment padding; we will discuss it later in COMP2100 lectures)2.   You must read the data record one field at a time.  It is suggested to use fread to read each field.

Each record that you read should be printed out as text.  Your output should exactly match the sample output files.

Remember that coding style is important: use good modularisation, and use header files

appropriately.

Submit your program for marking using the lab command.  We may use additional data files for testing, including files that are larger than the samples provided to you.

Stage 3: Sorting a binary data file [3 marks]

In this stage you will sort files of binary data in a known format.  This stage develops the following specific skills:

•    Reading and writing binary data files.

•    Opening and closing files.

•    Working with pointers to structures.

•    Memory allocation, dynamically sizing an array.

•    Using system library routines (specifically, a system library sort routine).

•    Writing code to compare structures with a lexical sort order.

•    Using a function pointer in C.

Resources

•    filestruct-description.txt: Describes the members of the C data structure which correspond to fields in the records of the data file.

•    filestruct-sort.txt: Specifies the sorting order.

•    input-*.bin: Sample binary input files.

•    output-*.bin: Sample binary output files corresponding to the input files.  The output files contain the same data as the input files, but the records are sorted.

C language features you may need to know:

•    fwrite system library call to write binary data to a file

•    qsort system library call to sort data

•    Memory allocation, and freeing memory

Useful Unix commands

You might find the following Unix system commands helpful. <