Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


Midterm Test – Summer 2022

Name:

Note: Open Google and Handouts

 

1) Please refer to the CSV file “forestfires.csv” and answer the following questions along with the correct functions and results.

 

a. How did you import the data into Python?

b. How many observations are there with a fire (i.e., area>0)?

c. How many observations are there with a rain (i.e., rain>0)?

d. How many observations are there with both a fire and a rain?

e. Show the columns month, day, area of the all the observations.

f. Show the columns month, day, area of the observations with a fire.

g. Please sort all observations based on FFMC.

h. Please calculate the mean of area, what result you saw?

 

2) Please refer to the 2 CSV files “Data-Orders.csv” and “Data-Customers.csv”.

 

a. What function you need to perform to make related column name same?

b. How do you merge these 2 data and what the output will be after merging?

c. How will you display consumer who is not shown in order file?

d. What is the difference between the earliest order date and the latest order date?

 

 

3) You have the following data:

 

 

 

a. How do you add your own name and email? And, what is the output?

b. How do you add the age information for the output in the above updated output?

 

4) Assume we have the following data in “question4.xlsx”:

 

Name Age City Working

John 19 Chicago Yes

Muse 27 San Jose No

Alex 22 Seattle No

Nina 30 San Jose Yes

Mary 20 Seattle No

Kate 38 New York Yes

Susan           32 San Jose Yes

 

a. Read in this Excel dataset into Spyder

b. How do you display only the data for person and working status

c. How to display the column names?

d. How to display the persons whose age is older than 29?

e. How to display the persons whose working status is not working?

f. How to display the persons who lives in San Jose, age is older than 26 and working?

g. How to re-display a list based on ascending age?

h. How to re-display a list based on descending age?

5) What is the difference between performing “inner_join” and “full_join” for the data in question (2) based on consumer ID?

 

6) Assume the age in question (4) is lunar year age, that is the number is 1 year older than the actual age.

Please create a new column which contain the actual age for all people.

 

7) From the data in question (4), how do you calculate the average of ages based on their living city.  

 

8)  From the data in question (4), how do you replace “Yes” to “1”, and “No” to “0” in Column “Working”?

 

9) You have the following data:

 

23 90 35 12 47 4 15

 

Please program a Python code to display the numbers which can be divisible by 3.

Please also attach your algorithm and pseudo-code.

 

10) Use dataset “sampledatafoodsales.xlsx”:

a. Change the date format to “Year-Month-Date”

b. Group and use City and Region as Index to list the total numbers of “Quantity”,  “UnitPrice” and “TotalPrice”.

c. How about the Average values for “Quantity”, “UnitPrice” and “TotalPrice” in previous question?

 

11)

If you have a bunch of Excel Data files named like:

abc-01232021.xlsx

fbe-09151988.xlsx

dhc-07292001.xlsx

zkj-03132012.xlsx

jou-03232012.xlsx

kou-09301988.xlsx

 

Please use Python to perfom:

a. Create directories whose names are based on file’s year part

b. Evaluate if the directory exists or not, if not, create new directory

c. Relocate the file name containing same “year” part into that “year” directory.

d. Rename each file name under each “year” directory with “month-date-year”.xlsx


 

12)

From the 2 CSV files “Merge-Data-Customer.csv” and “Merge-Data-Orders.csv”, please use Python to find the common column (with different column names) and merge these 2 data set into a single data-frame with the following 4 methods.  Then, please translate this data-frame into an Excel data set file.

a. use merge method – left

b. use merge method – right

c. use merge method – inner

d. use merge method - outer

  

13)

Please refer to the CVS data set – “SalesData.csv” and use Python to perform the following tasks:

a. Please sort the data based on Temperature

b. Please select entries whose temperature is higher than 20 Celius and generate a new dataframe.

c. Please plot a graph to describe the relationship between sales and the temperature.

 

14)

Please use the CSV data set – “EmployeeInfo.csv” and Python to perform:

a. group by sex

b. group by both sex and age

c. Can you analyze the salary based on sex and age, respectively, and then make a conclusion on if sex or age has bigger influence on their salary?

d. Use Python plot to confirm your conclusion in c.

 

15)

Please refer to Excel data set – “sales_data.xlsx” and Python to perform:

a. how many occurrences that sales > 1000 ?

b. how many occurrences that sales > 1000 and Category is “Technology” ?

c. Sum of the total sales of those occurrences having sales > 1000.

d. Sum of the total profits of those occurrences having sales > 1000 and Category is “Technology”.