闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CITS1401

Computational Thinking with Python

Project 2, Semester 2, 2023

(Individual project)

Submission deadline: Friday 20th October 2023, 6:00 PM

Value: 20% of CITS1401.

Project description:

You should construct a Python 3 program containing your solution to the following problem and submit your program electronically on Moodle. The name of the ﬁle containing your code should be your student ID e.g., 12345678.py. No other method of submission is allowed. Please note that this is an individual project. Your program will be automatically run on Moodle for sample test cases provided in the project sheet, if you click the “check” link. However, your submission will be tested thoroughly for grading purposes aier the due date. Remember you need to submit the program as a single ﬁle and copy-paste the same program in the provided text box. You have only one atempt to submit, therefore, do not submit until you are satisﬁed with your atempt. All open submissions at the time of the deadline will be automatically submited. Once your atempt is submited, there is no way in the system to open/reverse/modify it.

You are expected to have read and understood the University's guidelines on academic conduct. In accordance with this policy, you may discuss with other students the general principles required to understand this project, but the work you submit must be the result of your own eﬀort. Plagiarism detec甘on, and other systems for detecting potential malpractice, will therefore be used. Besides, if what you submit is not your own work then you will have learnt litle and will therefore, likely, fail the ﬁnal exam.

You must submit your project before the deadline listed above. Following UWA policy, a late penalty of 5% will be deducted for each day (or part day), aier the deadline, that the assignment is submited. No submissions will be allowed aier 7 days following the deadline except approved special consideration cases.

Project Overview:

The ABC research institute collected information of diﬀerent organisations from all over the world for their future investment purposes. The collected dataset contains several parameters about each organisation, such as name and id of the organisation, country of the organisation registration, category of work, foundation year, number of employees, median salary, proﬁt in 2020 and proﬁt in 2021.

You are required to write a Python 3 program that will read a CSV ﬁle. Aier reading the ﬁle, your program is required to complete the following tasks:

1) Create a dictionary and store the following information in it:

a. t-test score of proﬁts in 2020 and 2021 for each country.

b. Minkowski distance between the number of employees and the median salary for each country.

2) Create a nested dictionary that contains the following information for each category of organisations.

a) organization ID’s, and a list of the following data corresponding to each organization ID:

i. Number of employees.

ii. Percentage of proﬁt change from 2020 to 2021 (absolute value).

iii. Rank of the organisation within each category, with respect to the number of employees.

Requirements:

1) You are not allowed to import any external or internal module in python. While use of many of these modules, e.g., csv or math is a perfectly sensible thing to do in production seng, it takes away much of the point of diﬀerent aspects of the project, which is about geng practice opening text ﬁles, processing text ﬁle data, and use of basic Python programming skills.

2) Ensure your program does NOT call the input() function at any time. Calling the input() function will cause your program to hang, waiting for input that automated testing system will not provide (in fact, what will happen is that if the marking program detects the call(s), it will not test your code at all which may result in zero grade).

3) Your program should also not call print() function at any time except for the case of graceful termination (if needed). If your program has encountered an error state and is exiting gracefully then your program needs to return empty dictionaries and print an appropriate message. At no point should you print the program’s outputs instead of (or in addition to) returning them or provide a printout of the program’s progress in calculating such outputs.

Input:

Your program must deﬁne the function main with the following syntax:

def main(csvfile):

The input argument for this functoin is:

. csvfile: The name of the CSV ﬁle (as string) containing the record of the organisations around the world. The ﬁrst row the CSV ﬁle will contain the headings of the columns. A sample CSV ﬁle “Organisations.csv” is provided with project sheet on LMS and Moodle.

Output:

Two outputs are expected:

1) A dic甘onary which will have country names as keys, and the corresponding value for each country (key) will be a list containing t-test score and Minkowski distance

between number of employees and median salary of the respective country. The expected output is in the following format:

{‘country1’: [t-test score,minkowski distance],

‘country2’: [t-test score,minkowski distance],…, ‘countryn’: [t-test score,minkowski distance]}

2) A nested dictionary ‘D’ which will store the diﬀerent categories of organizations (such as ‘transportation’, ‘apparel’, etc.) as keys and each corresponding value will be another dictionary ‘d’. Each dictionary ‘d’ will store the organization IDs as keys within each category of organizations and information related to the organization IDs as values. Each value of ‘d’ will be a list containing the following data for each organisation:

a. number of employees,

b. absolute percentage of proﬁt change from 2020 to 2021, and

c. rank of an organisation within each category with respect to the number of employees (sort them in descending order, the organisation with the higher number of employees holds the higher rank, where the highest rank is ‘1’). If two organizations have the same number of employees, sort them (the tied organizations’ IDs only) in descending order of their proﬁt change. Below is the format:

{‘category1’:{‘organisation ID1’: [number of employees, absolute percentage of profit change, rank ],

‘organisation ID2’: [number of employees, absolute percentage of profit change, rank ]},

‘category2’:{‘organisation ID1’: [number of employees,

absolute percentage of profit change, rank ],…,

‘organisation IDN’: [number of employees, absolute percentage of profit change, rank]},…,

‘categoryK’:{‘organisation ID1’: [number of employees,

absolute percentage of profit change, rank ],…,

‘organisation IDN’: [number of employees, absolute percentage of profit change, rank]}}

Note: All the ﬂoat results should be rounded to 4 decimal values and all the strings should be converted to lower case. Also, keep in mind that a dictionary is an unordered collec甘on of key-value pairs.

Examples:

Download Organisations.csv ﬁle from the folder of Project 2 on LMS or Moodle. An example of how you can call your program from the Python shell (and examine the results it returns) are:

>>> output1, output2 = main(‘Organisations.csv')

The output variables returned are two dictionaries. Following are some examples of examining the returned dic甘onaries:

Example#1

>>> output1['brazil']

[-0.5175, 10174.3314]

>>> output2['biotechnology’]

{'3c08339af3bb8c8': [8575, 36.4935, 1], 'eaf5ae0fcbcb4dd': [6603, 78.062, 3], '139ab569bdfce4f': [3493, 62.6008, 4], 'a483cd7f7b486b4': [3427, 179.344, 5], '7ade1d82d2ac863': [7205, 140.2845, 2], 'bf1cc30febed38c': [481, 8.9567, 6], 'bde405d2e490ebe': [92, 38.1616, 7]}

Example#2

>>> output1['afghanistan']

[0.0367, 4400.639]

>>> output2['accounting']

{'a5e8ce5cf97c2ac': [8128, 760.9484, 1], '5e2bb2dace9511e': [7007, 73.0692, 3],'df66e70fae1aa5d': [7518, 0.5118, 2], '795195c9db5e1c0': [6977, 96.9351, 4], 'a6bc77d5ce07c7b': [6947, 90.4202, 5], 'b715731fa4a6cdb': [6429, 970.6279, 7], '8f55cd0ad6dcde2': [6202, 22.6138, 8], 'ca8e1dfba7b1d8d': [6628, 110.683, 6], 'bcaac3adb10bf1c': [6143, 801.5984, 9], 'c38cf79de2e6b6a': [5784, 125.3964, 10], 'ef56bdce48de5ff': [5523, 597.8386, 11], '0bcebfcd12bcb7e': [5282, 31.2454, 12], 'e0da4a69658eaca': [4491, 120.9667, 13], '27fbc78271f3aa2': [4288, 174.5934, 14], 'd457875b76d0ad8': [3784, 28.912, 15], 'ef7e820bc9f7e49': [2861, 40.1272, 16], 'a45e805db7feee1': [2658, 158.8379, 17], 'a3b8d27d51aae2f': [2135, 64.8933, 18], 'ba907c2acbc34ba': [2090, 13.0396, 19], 'f8a35a4b5d7b2c1': [871, 40.3551, 20]}

Assump甘ons:

Your program can assume the following:

. The order of columns can be diﬀerent than the order provided in the sample ﬁle. Also, there can be extra or less columns in the testing input ﬁle. Moreover, rows can be in random order except the ﬁrst row containing the headings.

. All string data in the ﬁle is case-insensitive, which means “Biotechnology” is same as “BIOTECHNOLOGY”. Your program needs to handle this situation to consider both to be the same.

. There can be missing or invalid data in a row, and in such instance(s) the entire row(s) should be ignored. Some examples of invalid data can be: negative or zero number of employees and median salary; identical organisation IDs; null/empty values in the required columns. You need to think of other invalid cases yourself.

. The necessary formulas are provided at the end of this document.

Important grading instruc甘on:

Note that you have not been asked to write speciﬁc functions. The task has been lei to you. However, it is essen甘al that your program deﬁnes the top-level func甘on main(csvﬁle) (hereaier referred to as “main()" in the project documents to save space when wri甘ng it.

Note that when main() is writen it s甘ll implies that it is deﬁned with its input argument). The idea is that within main(), the program calls the other func甘ons. (Of course, these func甘ons

may then call further func甘ons.) This is important because when your code is tested on Moodle, the tes甘ng program will call your main() func甘on. So if you fail to deﬁne main(), the tes甘ng program will not be able to test your code and your submission will be graded zero. Don’t forget the submission guidelines provided at the start of this document.

Marking rubric:

Your program will be marked out of 30.

22 out of 30 marks will be awarded automatically based on how well your program completes a number of tests, reflecting normal use of the program, and also how the program handles various states including, but not limited to, different numbers of rows in the input file and / or any error states. You need to think creatively what your program may face. Your submission will be graded by data files other than the provided data file. Therefore, you need to be creative to look into corner or worst cases. I have provided few guidelines from ACS Accreditation manual at the end of the project sheet which will help you to understand the expectations.

8 out of 30 marks will be awarded on style (5/8) “the code is clear to read” and eﬃciency (3/8) “your program is well constructed and run eﬃciently”. For style, think about use of comments, sensible variable names, your name at the top of the program, etc. (Please watch the lectures where this is discussed)

Style Rubric:

0	Gibberish, impossible to understand or style is poor
1-2	Style is fair
3-4	Style is good or very good, with small lapses
5	Excellent style, really easy to read and follow

Your program will be traversing text ﬁles of various sizes (possibly including large csv ﬁles) so you need to minimise the number of 甘mes your program looks at the same data items.

Eﬃciency

0	Code too complicated to judge efficiency or wrong problem tackled
1	Very poor efficiency, additional loops, inappropriate use of readline()
2	Acceptable or good efficiency with some lapses
3	Excellent efficiency, should have no problem on large files, etc

Automated tes甘ng is being used so that all submited programs are being tested the same way. Some甘mes it happens that there is one mistake in the program that means that no tests are passed. If the marker is able to spot the cause and ﬁx it readily, then they are allowed to do that and your - now ﬁxed - program will score whatever it scores from the tests, minus 4 marks, because other students will not have had the beneﬁt of marker interven甘on. S甘ll, that's way beter than geng zero. On the other hand, if the bug is hard to ﬁx, the marker needs to move on to other submissions.