Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Spring 2023 CSE 480

Project 1: File Format Exploration

January 11th, 2023

1   A Word of Warning

Make sure you read through the submission instructions thoroughly before you submit your project. Failure to comply with the submission instructions will result in a zero for this project.

2   Project Due Date

This project is due before January 19th at 11:59pm. The second-chance deadline for project 1 is

January 26th before 11:59pm. Submission instructions for this project can be found at the end of this document.

3   Project Overview

This project will convert between different plain text data file formats, specifically CSV, JSON, and XML. Your code will take in a given file, save it to some standardized format (of your choos- ing) and then output the file in the desired format.

You will need to write Python 3 code, and you will need to use the starter code provided. The starter code has function stubs, and you should not change these function names. You are free to add other functions if you need them.

You can use any built in modules of Python 3, and we encourage you to make use of the ’csv’, ’json’, and ’xml.etree’modules.

If you are unfamiliar with these file formats, the lecture on January 11th went in depth on all of the file formats as well as what needs to be done for this project.

4   What You are Given

On D2L we have released starter code for project 1, as well as sample datasets in CSV, XML, and JSON formats. We have also included a python script that can be used to run your code exactly how we will run it for testing. The python script will produce the converted file, which you can then compare against the ground truth file provided.

5   What You Need To Do

You need to complete the six function stubs in the starter code. The read functions should con- vert the input file into an intermediate’ format of your choice. You could use a list, dictionary, tuple, custom object, etc. It does not matter what format you choose, as long as it works for you. The read function then returns the file in this intermediate format.  Our test cases will capture the return value from the read function and send it in as input to the write function.

The write functions will take an input of the intermediate format and convert it to the specified format. For example, the write_xml_file’function should convert something from the interme- diate format to an xml file.  The write function should then write the actual file to the current directory. To make things easier, if a file you are writing to already exists, you can overwrite the file that is currently there and replace it with the new file. However, for our test cases this will not be an issue, so if you choose to not overwrite files it will be okay.

Our test cases will call the read function, save the returned object from that call, then send that into the write function. Once the file has been written, it will be checked for accuracy. From a development perspective, you will implement each of the six functions as if they were stan- dalone functions, and out testcases will call them individually based on what conversions we will test.

This means that all of your read functions must produce data in the exact same intermedi- ate’format. When the result of the read is passed to the write function, the write function will not know what the original format of the data was. That means your write functions have to be able to handle any conversion.

If you use any resources that did not come directly from the class, you must specify and cite them in your code file.  At the top of the starter code there will be a section to paste links to anywhere you got code or ideas from. This is to make sure that you are not plagiarising.

6   Specific Formatting Requirements

6.1   Order of Columns/Attributes

Although the order of the columns/attributes doesn’t matter in practice, for ease of tesing, order the columns/attributes of file in lexicographical order.

6.2   CSV Specific Instructions

You need to have a header line denoting the columns.

6.3   JSON Specific Instructions

The tabular JSON format that you will be using for input and output involves a file with a single array. For each record (e.g. line in a csv file), it becomes one object in the JSON array. A record object has a property (i.e. key) for each column, and a value which is the value of that record.

6.4   XML Specific Instructions

XML has a lot of freedom with how you structure your data. For this project, in each XML file, there should be a single "data" node, with as many "record" nodes within it as needed. In each record, there will be an column node corresponding to each column. In each column node, the text content should be the value for that record. There should be no attributes for any element.  Example:

 

Note: This output has been prettified, the correct output is all on one line (no extraneous whites- pace).

7   Testing Your Code

8   Testing Yourself

We’ve provided a sample dataset as well as a python script to test your code.

To do a conversion, you can run the following command using the provided testing files: python3 project1_cli.py crime.convert_to_json.csv

Where the third argument is the file to be converted.  When we test your code, we will use files with the same naming convention, with a ’.’between the dataset and the format to be con- verted to. For example, if you have an XML file of sports statistics that you want to convert to

JSON, the filename should be something similar to:

’baseballstats.convert_to_json.xml’.

We have provided the ground truth datasets in all file formats, so you can test if your code is correct by determining if there is any difference between the ground truth and the file created by your project.  There are many online resources that will check differences between files for you.

9   Instructor Testing

When we test your code, we will run a more comprehensive script to test your project with every possible combination of file conversions, and on files of different sizes, formats, number of columns, etc.

10   Assumptions

You can assume that every input file is valid, and that all CSV files have headers. You can assume that the filenames are correct, and do not have to do any error checking on the filenames.

11   A Few Notes

The instructor solution for this project is 94 lines of code, including comments.  The reason I mention this is to tell you to not overcomplicate things. If you have written 600 lines of code and spent 15 hours on this project, you are probably making things more complicated than they need to be. If you are struggling, please come to office hours or talk after class and we can get you on the right track.

12   Submission Instructions

You will submit a file named "project1.py" to the handin (handin.cse.msu.edu). If you used any external resources (something other than class material), make sure that you cite the resource in a comment at the top of your "project.py" submission file. The top of the starter code also has space to put your name, netid, and PID, make sure you fill this out fully before submitting your project.