Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PSTAT 105: Assignment 3

2022

All four of these questions require R to draw plots and do the calculations.  Please include the R and your answers to the questions together in your report.

1. The Probability Department decrees that all courses are to be graded in such a way that the grades (as percentages) follow a beta distribution with α = 6 and β = 2. In Prof. Smirnov’s course, he gives the following grades

40%    71.7%    81.7%    83.3%    93.3%    97.5%

(a) Draw a plot comparing the CDF of the beta distribution to the Empirical CDF of this data.

(b) Draw a plot of the function Gn (t) = ^n (Fn (t) − t) where Fn  is the empirical CDF of the data

(c) Use the ks .test function to test the null hypothesis that the data is from a beta(6 , 2) distribution and calculate an exact P-value.

(d) What do you conclude?

(e) Why would it be difficult to apply a χ2  test to this data?

2. I am interested in testing whether major earthquakes are more likely during certain times of the day. The USGS NEIC database provides a file EarthquakeData .htm which contains information about nearly all earthquakes of more than 6.0 on the Richter scale for dates from 2001 to 2012. You will need to do some data coding and interpretation to read in the times properly.  I urge you to be careful in inputting the data.

(a) Plot a histogram of the times with bars 15 minutes wide.

(b) Use a χ2   test to determine if the hour of the day that the earthquakes originate is equally

distributed among the 24 hours.

(c) Write some R code which will calculate the Kolmogorov–Smirnov statistics Dn(+)  and Dn(−) for testing that the times are uniformly distributed throughout the day. Find D .

(d) Calculate an approximate P–value for this hypothesis test using the Brownian Bridge approxi- mation we discussed in lecture.

3. Grete Heinz and Louis J. Peterson, at San Jose State University and at the U.S. Naval Postgraduate School in Monterey, California, took measurements from 507 subjects. Part of their data set is in the file shoulder .txt which includes the width of each subjects shoulders. We want to test if this data is normally distributed. We will be using the library nortest to perform these tests.

(a) Plot a histogram of the shoulder data using a number of breaks that you think is appropriate.

Draw the density of a normal distribution with the same mean and variance over the histogram for comparison.

(b) Use lillie .test, cvm .test, and ad .test to test whether or not this data is from a normal

distribution. What do you conclude?

(c) I’m concerned that the fact that men and women have different general body sizes may be causing a problem with our test. Re-run the normality tests on just the men and just the women separately. What do you conclude?

(d) I hate to divide a data set into two parts. I really want to perform one test using all of the data. We could calculate the mean for the male subjects and the mean for the female subjects. Then, subtract these means from the data points so that we move the two populations on top of each

other. (This is like taking the residuals from an ANOVA model.)

Try the normality tests on this new set of data. What do you conclude?

(e) Do you think it would be reasonable to assume a normal distribution and perform a two-sample

t-test using this data? Why?

4. Use a Kolmogorov–Smirnov test to test whether the distribution of birthdays in the basketball data from last week is uniformly distributed across the year (you can focus on only the players from after 1955.) Please describe your analysis and conclusion.