Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Module 05 — Data Wrangling

Assignment 05

Instructions

Create a Python Notebook with your response to the following questions and ensure that you explain the results from each question. Your work must be presented neatly; include your name in the notebook and indicate which question you are answering. You are welcome (and encouraged) to use this assignment template to help with the presentation of your solution. Click the following link to download the DS3000 Assignment Template.

(https://northeastern.instructure.com/courses/166788/files/25545331?wrap=1)

Question 1: Office of Addiction Services and Support chemical dependence treatment programs (50 pts)

The Office of Addiction Services and Support publishes a dataset on reported admissions of people in certified chemical dependence treatment programs throughout New York State (NYS). This dataset includes the number of admissions to certified treatment programs aggregated by the program category, county of the program location, age group of client at admission, and the primary substance of abuse group.

For more information on the dataset, visit the following website (https://data.ny.gov/Human-Services/Chemical-Dependence-Treatment-Program-Admissions-B/ngbt-9rwf) .

Dataset: source files (https://data.ny.gov/api/views/ngbt-9rwf/rows.csv) | Click this link to view a description of each field. (https://data.ny.gov/Human-Services/Chemical-Dependence-Treatment-Program-Admissions-B/ngbt-9rwf)

You are given the task of performing a comprehensive analysis of the admission statistics from 2007 to 2021 and summarize your findings.

1. (0 pts) Load the data directly from the url into a dataframe. Here is the direct link to the data: https://data.ny.gov/api/views/ngbt-9rwf/rows.csv (https://data.ny.gov/api/views/ngbt-9rwf/rows.csv)

2. (5 pts) Evaluate the dataset to determine if ALL variables are represented in their expected type. Convert variables to suitable data types (if needed) and perform at least one additional data preparation step.

3. (10 pts) Visualize the distribution of Age Groups, Program Category, Primary Substance Group, and Admissions. Ensure that you choose an appropriate graph based on the type of data. Explain each chart.

4. (5 pts) Create a function called annualAdmissions() that calculates the total number of reported admissions that transpired each year, for the entire state of NY and display the results using a line chart. Annotate the chart to show the year with the highest number of admissions. Execute the function in a new cell. Explain the chart and discuss any patterns or trends that you have observed over time.

5. (10 pts) Create a function called annualAdmissionsByCounty(year). The function should take the year as input, filter the data to find all admissions for that year and calculate the proportion of admissions grouped by county. For example, if the year is 2007, the function should calculate the admissions as follows: county A 75%, county B 20% and county C 2.5%, etc. Display a bar chart with the top 10 counties. Using a new cell, visualize the annualAdmissionsByCounty() for the last 10 years. What are the patterns that you have observed?

Note: Ensure that you visualize the results.

6. (10 pts) Filter the data, and extract all admissions to the various “Rehab” facilities; i.e. you should perform a case-insensitive match for all facilities that include the word rehab, rehabilitation, etc. Using the filtered data, identify which substance is the most prominent among each age group. Visualize and explain the results.

7. (10 pts) Using the filtered “rehab” data from question 6 above, identify any patterns in the admission to rehab facilities in any 5 counties and substance groups.

Explain your observations.

8. (5 pts)[optional/bonus] Create any (1) visualization of your choice to demonstrate something interesting about the data. Ensure that you explain what you will demonstrate and the results.

Note: Ensure that all visualizations have a title and label both the x and y axes; all numeric calculations should be rounded to 2 decimal places.

Deliverable

Include your name at the top of your notebook (in a markdown cell) along with the assignment number.

Ensure that your Python code is in a code cell and that it executes without errors.

Submit a .ipynb file named as DS3000.A5.FirstName.LastName.ipynb where FirstName.LastName is your first and last name, e.g., DS3000.A5.Jane.Smith.ipynb.

Submission Details

Submit the assignment, by the due date, by clicking on the "Submit Assignment" button and attaching the required file(s). The submission link is available for two days after the due date with late submissions accepted until Tuesday with a 10% penalty for each late day. No late submissions are accepted past Tuesday.

Early Submission: Submit this assignment by Friday 11:59PM ET to receive 5 bonus points.