AcF 351b Python Stream Final Exam Part I
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
AcF_351b_Python_Stream_Final_Exam_Part_I - Jupyter Notebook
This is the first part of the final exam for AcF 351b: Python Stream.
Students are expected to act according to the highest ethical standards. All students enrolled at Lancaster University are to perform their academic work according to standards set by faculty members, departments, schools and colleges of the university; and cheating and plagiarism constitute fraudulent misrepresentation for which appropriate sanctions are warranted and will be applied. Please note that any form of violation of the following rules will be treated as plagiarism
1. Answer the questions yourself without asking others for assistance. This is a test of your ability of data science and computer programming.
2. Do not share the questions or your answers with anyone. This includes posting the questions or your solutions publicly on services like quora, stackoverflow, or github.
We will run a system to detect any kind of plagiarism, e.g., coding scripts with high similarities.
Do NOT erase the #export at the top of any cells as it is used by notebook2script.py to extract cells for submission.
Import modules.
Do NOT change the following cell!
In [1]:
If you need extra modules, use the following cell to import them.
In [2]:
Section 1: Basic Data Science
In this section, you will be asked to answer questions regarding WiFi hotspot locations in NYC. Please make sure that the dataset entitled "NYC_Wi-Fi_Hotspot_Locations.csv" is in the same folder as the Jupyter Notebook.
Each row in the data represents one reported WiFi hotspot.
A data dictionary is also provided.
The following script reads the csv file into the memory, and stores it into a dataframe called df.
Do not change it
In [3]:
Please answer the following 8 questions. For each question,
1. Please write down the script used to compute your response in the Code Cell; Conclude your scripts with a final print() function to print out final numeric answers. Important: Make sure your scripts are executable !!!
2. Please fill out the final numeric answers in the cells at the end of the section. See below.
Question 1.1: How many unique providers are there? (10 pts)
In [4]:
Question 1.2: What fraction of WiFi hotspots are in parks? For simplicity, you can consider a park a place where the name of the location where the WiFi is located contains the word "park"and its variants? (10 pts)
In [5]:
Question 1.3: How are WiFi hotspots distributed across neighborhoods? For this question, calculate the number of WiFi hotspots per capita for each Neighborhood Tabulation Area (NTA). Exclude NTAs with less than 30 reported WiFi hotspots. Report the interquartile range (https://en.wikipedia.org/wiki/Interquartile_range) of the averages.
For population data for each NTA, use this dataset (https://data.cityofnewyork.us/api/views/rnsn-acs2/rows.csv); information on the dataset is found here (https://data.cityofnewyork.us/City-Government/Census-Demographics-at-the-Neighborhood-Tabulation/rnsn-acs2). Use the population data for the column corresponding to 2010. (10 pts)
In [6]:
Question 1.4: The dataset contains information on the date the hotspot was activated. What fraction of all activations occurred on the day of week that had the most activations? In other words, if Monday had the most activations, what fraction of activations occurred on Monday? Note: there are some dates that don't make sense. Ignore them for the analysis. (10 pts)
In [7]:
Question 1.5: How many WiFi hotspots are there by the second most common provider in the Bronx? (10 pts)
In [8]:
Question 1.6: What is the probability that a WiFi hotspot is free (without any limitations) given that it's not in a library? For this question, pull the location data based on the "Location_T" field. (10 pts)
In [9]:
Question 1.7: How far must one travel from one hotspot to another? For this question, report the median distance, in feet, of the average distance between each hotspot to the nearest 3 hotspots. For your distance calculation, calculate the distance "as the crow flies" (https://en.wikipedia.org/wiki/As_the_crow_flies). For simplicity, please use the spherical Earth projected to a plane equation (https://en.wikipedia.org/wiki/Geographical_distance#Spherical_Earth_projected_to_a_plane) for calculating distances. Use the radius of the Earth as 6371 km. (10 pts)
Remember, report your answer in feet.
In [10]:
Question 1.8: If you plot the number of hotspot activations for each month, you'll notice a general increase but then a precipitous drop after June 2018. Using a linear estimate for the number of monthly activations, what is rate of increase in monthly activations? Only consider data before July 1, 2018 and set the start date as the earliest date of the data. If you need to, use 30.5 days in a month. (10 pts)
In [11]:
Now, please fill out the all numeric answers to Question 1.1-Question 1.8 in the following code cells. DO NOT UNCOMMENT.
In [12]:
In [13]:
In [14]:
In [15]:
In [16]:
In [17]:
In [18]:
In [19]:
Section 2: Simple Programming
Please answer the following 5 questions. For each question,
1. Please write down the script used to compute your response in the Code Cell; Conclude your scripts with a final print() function to print out final numeric answers. Important: Make sure your scripts are executable !!!
2. Please fill out the final numeric answers in the cells at the end of the section.
Question 2.1: Compute 1-2+3-4+5-6+...-1000 (10 pts)
In [20]:
Question 2.2: How many prime numbers (https://en.wikipedia.org/wiki/Prime_number) are there between 1 and 1000? (10 pts)
In [21]:
Question 2.3: The following code snippet generates an array named arr of 1000 numbers between 0 and 100. Find out the SECOND largest number (10 pts)
In [22]:
Question 2.4: The following code snippet generates an array named arr of 1000 numbers between 0 and 100. Compute the sum of all numbers less than or equal to 50, and round the result to three digit decimals (10 pts)
In [23]:
Question 2.5: The following code snippet generates a 30 × 30 array (matrix) named arr with numbers between 0 and 10. Compute the sum of the maxima of each row, and round the result to three digit decimals (10 pts)
In [24]:
Now, please fill out the all numeric answers to Question 2.1-Question 2.5 in the following code cells. DO NOT UNCOMMENT.
In [25]:
In [26]:
In [27]:
In [28]:
In [29]:
Congratulations for finishing the first part of the final exam !
Remeber to save your Jupyter Notebook.
Now it's a good time to submit for grading.
Please uncomment and run the cell below. Your code will be generated in the folder named first_part , please upload submission.py file AND the Jupyter Notebook
In [30]:
%run helpers/notebook2script1 first_part
Converted AcF_351b_Python_Stream_Final_Exam_Part_I.ipynb to first_part\submis sion_part_I.py
In [ ]:
fSimple Programming
2023-08-15