Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 782 Statistical Computing

Assignment 3(2022.1)

1.  [19 marks]        You want to explain the one-sample t-test to your friend so you decide to create a plot of the t-distribution with 20 degreees of freedom for illustration purposes.  To keep things simple you decide to show the one-sided test for p = 0.05 and come up with the following graphic:

t(df=20) distribution

= 0.05

 

P(X > 1.725) = p


−4


−2


0

test−statistic


2


4

 

(a) Re-create the plot using R as closely as possible.                                                  [7 marks]

(b) Write a function f with the following parameters:  df numeric,  number of degrees of

freedom, p numeric, the p-value and onesided logical, TRUE for two-sided test and FALSE for one-sided test.  The function should automatically compute the necessary quantiles based on the p value and draw the plot.   Therefore calling f(20,  0.5,  TRUE) should draw the same plot as the previous question (except for the range of x which may vary). For a two-sided test you colour p/2 from each (left and right) tail of the distribution.      Test your code with f(20,  0.5,  TRUE) and f(10,  0.03,  FALSE) where the latter should look someting like this:                                                                                         [7 marks]

 

t(df=10) distribution

= 0.03

 

P(X < −2.527) = p/2

 

−6               −4               −2                0                 2                 4                 6

test−statistic


(c) The t-test statistic for the hypothesis that the sample X = {xī , ..., xn } follows a distribu- tion with mean µ is computed as

  µ

t =       

′n

where  is the sample mean and  the sample standard deviation. This statistic follows the t distribution with n − 1 degrees of freedom.

Sample 30 values from the normal distribution with mean 3 and standard deviation 2. Compute the t-test statistic for that sample and the five hypotheses: µ = {2, 2.5, 3, 3.5, 4}. Use the function f from above to draw the corresponding distribution for p = 0.05 (two- sided) and superimpose the five resulting test statistics as a thick vertical lines each labeled with the corresponding value of µ. Which of the values fall into the region indicating the the hypothesis is accepted?                                                                                    [5 marks]

 


2.  [31 marks]        Stats NZ publishesīdetailed datasets on monthly imports and exports to/from Aotearoa New Zealand which includes the countries and categories of goods. In this question we will use the monthly import statistics for years 2000 through 2021.  The orignal data is reasonably large (over 18 million records, 3.2Gb) so we will restrict ourselves to the value of imported goods aggregated by country and month. The resulting dataset can be found in the imports-by-country.csv file with the following columns:  "yearmonth" specifying the year and month in the form YYYYMM where YYYY is the year and MM is the month, "country" name of the county the goods are imported from and "value" the value of the goods (in NZD) imported that month. Answer the following questions based on this dataset.

 

(a)  Compute the total value of imports by country over the entire period. List the top three

countries from which New Zealand imports (by total value of imports).              [4 marks]

(b) Draw a pie chart of the total value of imports by country using all countries. Discuss two

issues with such visualisation (one sentence each).                                               [3 marks]

(c) Draw a bar chart showing the average annual import value in billions of NZD for the top

15 countries. Pick a suitable orientation and margins such that the names of all countries are fully visible.                                                                                                      [5 marks]

(d) We want to look at the development of the imports over time.   To make things more manageable, we want to focus on the top 11 countries and aggregate all other countries into one category "other". Draw a line plot with x-axis being time and y axis the monthly import value in billions. Each of the top 11 countries and "other" should by represented by one line (hence 12 lines total). Add a corresponding legend for countries. Discuss the trade evolution of the top three countries over time.                                            [7 marks]

(e) We want to look at the seasonal aspect of the imports for each country.  Write R code to replicate Figure 1. The colours for years are generated using the hcl() function with chroma 60 and luminance 70.                                                                                 [8 marks]

(f) Based on Figure 1, are any countries showing a seasonal effect (a cyclical pattern that is

repeated each year) and if so, which? Does this plot enable us to reveal steadily increasing imports? If so, which countries and how can you tell from the plot?                   [4 marks]