关键词 > DS1000B

DS1000B – Assignment #1

发布时间:2024-05-24

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DS1000B - Assignment #1

Notes:

•     Submissions must be done via Gradescope. You must carefully assign pages to their

corresponding questions. You will receive a grade of zero in each case below:

a.   Submission is not in PDF format.

b.   Questions with no pages assigned to them.

•     Please submit asingle PDF file. Here is a recommended way to achieve this:

a.   If you write your derivation on papers, you can scan them into a pdf file (if they are images, paste images to a word document then save as a pdf file).

b.   Write your Python code (e.g. in Jupyter notebook) then save it as a pdf file.

c.   Combine all the pdf files above into one pdf file.

•     If you have difficulty in formatting your submission, please see the “Lab1-preparation” file, or attend TA office hours as soon as possible.

•      Each student must submit their own work. Scholastic offences are taken seriously, andstudents are directed to read the appropriate policy, specifically, the definition of whatconstitutes a Scholastic Offence, at the following Website:

http://www.uwo.ca/univsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf

Grade Breakdown:

Part 1: Written Answer

Question 1          12

Question 2          11

Question 3         12

Question 4         10

Total Points =    45

Part 2: Python

Question 5           5

Question 6           8

Question 7           9

Question 8          13

Total Points =     35

Total Points: 80

Part 1 - Written Answer

Question 1 [12 Points]

The following histogram shows the results of asking 40 families how many pets they own.

 

a.   [3 Point] State the individual(s) and variable(s) for this data.  Are the variable(s) categorical or quantitative?

b.   [1 Point] What is the shape of the distribution?

c.    [1 Point] What is the mean of the distribution?

d.   [2.5 Points] What is the five-number summary for this distribution?

e.   [2 Points] Use the IQR rule to determine if there are any outliers.

f.    [2.5 points] Draw a box plot for this dataset.

Question 2 [11 Points]

The following data represents the birth weight in lbs of a group of students in grade 1.

7.9

5.9

6.9

8.3

6.7

4.2

7.4

11.2

6.0

5.3

7.5

5.8

7.2

8.4

6.6

8.6

6.5

5.6

5.8

10.2

10.5

4.6

6.1

5.4

a.   [3 Points] Draw astemplot.

b.   [1 Point] What is the mode of this dataset?

c.    [3 Points] Calculate the mean and median of this dataset.  Compare these values and state what this tells you about the shape of the distribution.

d.   [4 Points] Does this dataset contain any outliers?  How do you know?

Question 3 [12 Points]

This is the highest mark in course ABC for the past 6 terms:

83             86             84             86             87             84

a.   [2 Points] Calculate the mean and standard deviation.

b.   [2.5 Points] Calculate the five-number summary.

c.    [1.5 Points] In order to describe this dataset, would you use the measurements from part (a) or part (b)?  Briefly explain why?

d.    [6 Points] If the grade of 87 was recorded incorrectly and should be 97, what would your answers be to the questions in parts a,b, c?

Question 4 [10 Points]

Suppose the mean height for adult females in Canada is 66 inches and the standard deviation is 3 inches.  Assume these heights follow a normal distribution.

a.   [2 Points] What percentage of adult females are above 75 inches tall? Use the 68-95- 99.7 rule and draw a picture that illustrates your rationale.

b.   [2 Points] What percentage of adult females are between 60 and 72 inches tall? Use the 68-95-99.7 rule and draw a picture that illustrates your rationale.

c.    [3 Points] What percentage of adult females are between 63 and 70.5 inches tall?  Use Table A from the textbook.  Show all your work.

d.   [3 Points] How tall must an adult female be to be placed in the top 10% of all adult

females in Canada?  Use Table A from the textbook.  Draw and picture and show all your work.

Part 2 - Python (Be sure to show all your code and results)

*Note: you do not have to use the exact coding learnt in the labs to earn full marks

Question 5 [5 Points]

A soft drink machine has 5 options to choose from: Coca-Cola, Dr. Pepper, Sprite, Diet Coke, and Pepsi. A sample of 50 soft drink purchases is selected, and the proportion of each selectionis shown below.

Soft Drink

Percentage %

Coca-Cola

25

Diet Coke

25

Dr. Pepper

15

Pepsi

30

Sprite

5

a.   [2 points] Can we use a pie chart for this dataset? Why?

b.   [3 points] Create a pie chart to display the distribution of soft drink purchases.   Use these colours: Coca-Cola is brown, Diet Coke is white, Dr. Pepper is red, Pepsi is blue, and Sprite is green. Note: If your answer in part a. is NO, you may improve the dataset in order to create a pie chart.

Question 6 [8 Points]

Smartphones are advanced mobile phones with internet, photo, music and video capability. The following survey results show smartphone ownership by age.

 

Smart phone

Other Cell Phone

No Cell Phone

Age

18-24

59

36

7

25-34

69

28

4

35-44

54

41

6

45-54

38

52

10

55-64

35

55

12

65+

21

33

48

a.   [3 point] The second row gives percentages of cellphone ownership forages 25-34. Using this information, draw a bar chart.

b.   [3 points] Draw a bidimensional bar chart that illustrates the smartphone ownership for every age group.

c.    [2 points] Using the bar charts in part a. and b., give a simple description of the changes in cellphone ownership for the different age groups.

Question 7 [9 Points]

The (cars.csv) dataset provides information about the prices (in US$) and brands of randomly selected 50 cars for sale in the USA.

a.   [3 Points] Make a histogram of this data with the number of bins equal to 15.

b.   [2 Points] What is the shape of the distribution? What is the count of cars with prices exceeding $45,000?

c.    [4 Points] What are the minimum and maximum prices of cars? What are the mean and median?

Question 8 [13 Points]

(Chol.csv) A study examining the health risks of smoking measured the cholesterol levels of people who had smoked for at least 25 years and people of similar ages who had smoked for no more than 5 years and then stopped.

a.   [3 Points] Give a graphical comparison of the cholesterol distributions for the two groups using side-by-side boxplots.

b.   [4 Points] Provide appropriate numerical summaries (including five-number summary, mean, standard deviation, count) for the two distributions.

c.    [5 Points] Follow the steps below to identify outliers:

1.   From the box plots, can you identify any outliers in either group?

2.   If there exists an outlier in either group, please identify the row index and value of outlier(s) using the IQR rule.

d.   [1 Point] What can you say about the effect of smoking on cholesterol levels?