关键词 > Python代写

Python For Data Analysis - Fall 2023

发布时间:2023-12-09

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Take HomeFinal Exam

Python For Data Analysis - Fall 2023

Due Date: Thursday Dec 14, 2023

Instructions

You should turn in a Jupyter notebook.

You should complete this assignment using Pandas and no other packages.

●    In fact, you should answer all questions using Pandas

You should answer all questions using Pandas

Scenario

You recently started working as a Data Analyst for a healthcare company that provides telehealth visits. The company has been in operation for a few years and operates in a few different markets. It's an exciting time as revenue has been growing and the executive committee is thinking about purchasing another telehealth provider. Your team lead gave you access to a dataset that stores basic information about the visit history of the rival telehealth  provider. The executive committee has a few basic questions and you are tasked with answering them!

Points

●    Each question is worth 5 points.

There are 23 questions so the total is 115 points.

Questions

1.   Load the data. How many rows are in the dataset?

2.   Note that the first row is a header and contains the column names. How many columns are there in the dataset and what are their names?

3.   How many distinct patients are there in the dataset?

4.   How many markets are there in the dataset and what are the distinct values?

5.  Write a function named count_distinct_values(col) that takes as input a column name (string) and returns the number of distinct values in that column. For example, in   this "toy" dataset (which has a similar structure to your dataset),

[

["x", "y", "z"],

[ 1 ,  2 ,  8 ],

[ 1 ,  5 ,  7 ],

[ 1 ,  5 ,  2 ],

]

the function would return the following values:

count_distinct_values("x") == 1

count_distinct_values("y") == 2

count_distinct_values("z") == 3

Test your function on the count of patients and markets. Hint: use assert statements to verify that count_distinct_values("patient_id") and count_distinct_values("market") match the answers to questions 4 and 5 above.

6.   Now use your function count_distinct_values(col="zip_code") to determine the number of distinct zip codes in the dataset.

7.  What are the distinct values in the visit_type column in the dataset?

8.  What are the visit counts corresponding to each visit_type? To solve this problem, iterate over the dataset and count the visit_types for each visit. Store the counts in a dictionary   named visit_type_counts, which should have the format

visit_type_counts = {"A": visit_count1, "B": visitcount2, ...}

Can you think of an assertion that verifies that your result is consistent with the data?

9.  What is the min, max and average of the values in the charge column?

10. If you filter the dataset for the Boston market, what is the min, max and average of the values in the charge column in the resulting subset?

11. If you filter the dataset for the NYC market, how many distinct patients are there?

12. If you filter the dataset for the LA market, how many distinct patients are there?

13. Iterate over the dataset and count the visits in each market. Report the counts using a dictionary, which should have the format {market1: visit_count1, market2:  visitcount2, …}

14. How many visits occurred on Dec 1 2022? Note, you do not need a sophisticated   date-time package to answer this, you can answer this using basic string methods.

15. How many visits occurred during the month of December in 2022? (Again you can answer this without importing any date-time packages)

16. How many visits occurred during the month of December in 2022 in the NYC market?

17. How many visits of type "A" occurred during the month of December in 2022 in the NYC market?

18. What is the average number of visits per patient in the entire dataset?

19. If you take a closer look at the data, you will find that there are some patients who have   multiple visits. We will refer to these patients as repeat patients. For example, how many visits does the patient with this patient_id have? What are the visit dates and market?

1234567-abcd-1234-a1b2-abc123abc123

20. How many repeat patients are there in the dataset? What is the percentage of repeat patients to total patients?

21. What is the average number of visits that a repeat patient has had?

22. How many repeat patients had visits in more than one market?

23. We want to compute the total number of visits whose charges were above the visit-type   average charge. In order to answer this question, first determine the average charges for each visit-type. Then determine if the charge for each visit was above or below this average (depending on the visit-type). Finally count the number of visits that had an above average visit-type charge.