Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

AcF_351b_Python_Stream_Final_Exam_Part_I - Jupyter Notebook

This is the first part of the final exam for AcF 351b: Python Stream.

Students are expected to act according to the highest ethical standards. All students enrolled at Lancaster University are to perform their academic work according to standards set by faculty members, departments, schools and colleges of the university; and cheating and plagiarism constitute fraudulent misrepresentation for which appropriate sanctions are warranted and will be applied. Please note that any form of violation of the following rules will be treated as plagiarism

1. Answer the questions yourself without asking others for assistance. This is a test of your ability of data science and computer programming.

2. Do not share the questions or your answers with anyone. This includes posting the questions or your solutions publicly on services like quora, stackoverflow, or github.

We will run a system to detect any kind of plagiarism, e.g., coding scripts with high similarities.

Do NOT erase the #export at the top of any cells as it is used by notebook2script.py to extract cells for submission.

Import modules.

Do NOT change the following cell!

In [1]:

#export
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import string

If you need extra modules, use the following cell to import them.

In [2]:

#export
# imported extra moduels:

Section 1: Basic Data Science

In this section, you will be asked to answer questions regarding WiFi hotspot locations in NYC. Please make sure that the dataset entitled "NYC_Wi-Fi_Hotspot_Locations.csv" is in the same folder as the Jupyter Notebook.

Each row in the data represents one reported WiFi hotspot.

A data dictionary is also provided.

The following script reads the csv file into the memory, and stores it into a dataframe called df.

Do not change it

In [3]:

#export
# if connected to the internet, import the dataset from the internet address
try:
df = pd.read_csv("https://frankxu1987.weebly.com/uploads/6/2/5/8/62583677/nyc_wi-fi_hotspot_l
# otherwise, import the dataset from the local .csv file
except:
df = pd.read_csv("NYC_Wi-Fi_Hotspot_Locations.csv")

Please answer the following 8 questions. For each question,

1. Please write down the script used to compute your response in the Code Cell; Conclude your scripts with a final print() function to print out final numeric answers. Important: Make sure your scripts are executable !!!

2. Please fill out the final numeric answers in the cells at the end of the section. See below.

Question 1.1: How many unique providers are there? (10 pts)

In [4]:

#export
# Code script for Q 1.1
# Write your code script below

Question 1.2: What fraction of WiFi hotspots are in parks? For simplicity, you can consider a park a place where the name of the location where the WiFi is located contains the word "park"and its variants? (10 pts)

In [5]:

#export
# Code script for Q 1.2
# Write your code script below

Question 1.3: How are WiFi hotspots distributed across neighborhoods? For this question, calculate the number of WiFi hotspots per capita for each Neighborhood Tabulation Area (NTA). Exclude NTAs with less than 30 reported WiFi hotspots. Report the interquartile range (https://en.wikipedia.org/wiki/Interquartile_range) of the averages.

For population data for each NTA, use this dataset (https://data.cityofnewyork.us/api/views/rnsn-acs2/rows.csv); information on the dataset is found here (https://data.cityofnewyork.us/City-Government/Census-Demographics-at-the-Neighborhood-Tabulation/rnsn-acs2). Use the population data for the column corresponding to 2010. (10 pts)

In [6]:

#export
# Code script for Q 1.3
# Hint: you probably find the following line useful ***
# df_nta= pd.read_csv("https://data.cityofnewyork.us/api/views/rnsn-acs2/rows.csv")
# Write your code script below

Question 1.4: The dataset contains information on the date the hotspot was activated. What fraction of all activations occurred on the day of week that had the most activations? In other words, if Monday had the most activations, what fraction of activations occurred on Monday? Note: there are some dates that don't make sense. Ignore them for the analysis. (10 pts)

In [7]:

#export
# Code script for Q 1.4
# Write your code script below

Question 1.5: How many WiFi hotspots are there by the second most common provider in the Bronx? (10 pts)

In [8]:

#export
# Code script for Q 1.5
# Write your code script below

Question 1.6: What is the probability that a WiFi hotspot is free (without any limitations) given that it's not in a library? For this question, pull the location data based on the "Location_T" field. (10 pts)

In [9]:

#export
# Code script for Q 1.6
# Write your code script below

Question 1.7: How far must one travel from one hotspot to another? For this question, report the median distance, in feet, of the average distance between each hotspot to the nearest 3 hotspots. For your distance calculation, calculate the distance "as the crow flies" (https://en.wikipedia.org/wiki/As_the_crow_flies). For simplicity, please use the spherical Earth projected to a plane equation (https://en.wikipedia.org/wiki/Geographical_distance#Spherical_Earth_projected_to_a_plane) for calculating distances. Use the radius of the Earth as 6371 km. (10 pts)

Remember, report your answer in feet.

In [10]:

#export
# Code script for Q 1.7
# Write your code script below

Question 1.8: If you plot the number of hotspot activations for each month, you'll notice a general increase but then a precipitous drop after June 2018. Using a linear estimate for the number of monthly activations, what is rate of increase in monthly activations? Only consider data before July 1, 2018 and set the start date as the earliest date of the data. If you need to, use 30.5 days in a month. (10 pts)

In [11]:

#export
# Code script for Q 1.8
# Write your code script below

Now, please fill out the all numeric answers to Question 1.1-Question 1.8 in the following code cells. DO NOT UNCOMMENT.

In [12]:

#export
# Your answer to ***Question 1.1***:

In [13]:

#export
# Your answer to ***Question 1.2***:

In [14]:

#export
# Your answer to ***Question 1.3***:

In [15]:

#export
# Your answer to ***Question 1.4***:

In [16]:

#export
# Your answer to ***Question 1.5***:

In [17]:

#export
# Your answer to ***Question 1.6***:

In [18]:

#export
# Your answer to ***Question 1.7***:

In [19]:

#export
# Your answer to ***Question 1.8***:

Section 2: Simple Programming

Please answer the following 5 questions. For each question,

1. Please write down the script used to compute your response in the Code Cell; Conclude your scripts with a final print() function to print out final numeric answers. Important: Make sure your scripts are executable !!!

2. Please fill out the final numeric answers in the cells at the end of the section.

Question 2.1: Compute 1-2+3-4+5-6+...-1000 (10 pts)

In [20]:

#export
# Code script for Q 2.1
# Write your code script below

Question 2.2: How many prime numbers (https://en.wikipedia.org/wiki/Prime_number) are there between 1 and 1000? (10 pts)

In [21]:

#export
# Code script for Q 2.2
# Write your code script below

Question 2.3: The following code snippet generates an array named arr of 1000 numbers between 0 and 100. Find out the SECOND largest number (10 pts)

In [22]:

#export
####### DO NOT change the code script below ##############
np.random.seed(10);
arr = np.random.random(1000)*100
####### DO NOT change the code script above ##############
# Code script for Q 2.3
# Write your code script below

Question 2.4: The following code snippet generates an array named arr of 1000 numbers between 0 and 100. Compute the sum of all numbers less than or equal to 50, and round the result to three digit decimals (10 pts)

In [23]:

#export
####### DO NOT change the code script below ##############
np.random.seed(179);
arr = np.random.random(1000)*100
####### DO NOT change the code script above ##############
# Code script for Q 2.4
# Write your code script below

Question 2.5: The following code snippet generates a 30 × 30 array (matrix) named arr with numbers between 0 and 10. Compute the sum of the maxima of each row, and round the result to three digit decimals (10 pts)

In [24]:

#export
####### DO NOT change the code script below ##############
np.random.seed(81);
arr = (np.random.random(900)*10).reshape((30,30));
####### DO NOT change the code script above ##############
# Code script for Q 2.5
# Write your code script below

Now, please fill out the all numeric answers to Question 2.1-Question 2.5 in the following code cells. DO NOT UNCOMMENT.

In [25]:

#export
# Your answer to ***Question 2.1***:

In [26]:

#export
# Your answer to ***Question 2.2***:

In [27]:

#export
# Your answer to ***Question 2.3***:

In [28]:

#export
# Your answer to ***Question 2.4***:

In [29]:

#export
# Your answer to ***Question 2.5***:

Congratulations for finishing the first part of the final exam !

Remeber to save your Jupyter Notebook.

Now it's a good time to submit for grading.

Please uncomment and run the cell below. Your code will be generated in the folder named first_part , please upload submission.py file AND the Jupyter Notebook

In [30]:

%run helpers/notebook2script1 first_part

Converted AcF_351b_Python_Stream_Final_Exam_Part_I.ipynb to first_part\submis sion_part_I.py

In [ ]:

fSimple Programming