关键词 > STATS101/108

STATS 101/108 - Past exam SS24

发布时间:2024-06-14

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 101/108 - Past exam SS24

Question 1

Use the information below to answer the questions in this section.

Researchers were interested in how people felt about job security. In March 2023 they took a random sample of New Zealanders in employment and asked “Which of the following best describes your feeling about your job security? “ with options shown in the table below. Also shown are the proportion of respondents who chose each option.

Fully secure           Very secure          Reasonably secure         Worry sometimes       Worry always       Sample size

30.3%                       33.8%                      25.3%                          8.7%                     1.9%                  1500

Researcher 1 had the research question: What proportion of New Zealanders in employment feel their job is fully secure?

The iNZight Lite output from constructing a bootstrap confidence interval is below.

Q1a

A sample-to-population inference is required when using this data to answer the research question.

The population of interest for this research is the people in the sample who chose Fully secure.

An estimate for the parameter of interest is 0.303.

Q1b

Previous research has led to the claim that only one quarter of all New Zealanders in employment feel their jobs are fully secure.

A proportion of 25% one of the plausible values provided by the VIT bootstrap confidence interval for the proportion of New Zealanders in employment that feel their jobs are fully secure.

Therefore, the claim above supported by the VIT bootstrap confidence interval.

Q1c

Researcher 2 had the research question: What proportion of New Zealanders are always worried about their job security?

Based on the data shown in the table in Q1a, the bootstrap confidence interval constructed to help answer Researcher 2’s research question would be and centred compared to the confidence interval constructed by Researcher 1.

Q1d

Researcher 3 wants to investigate the feeling of job security for employees of the five largest companies in New Zealand.

Write an interpretation of the bootstrap confidence interval shown in Q1a.

Write one sentence explaining why the confidence interval you interpreted is or isn't useful for Researcher 3's purpose.

Question 2

Q2a

Researchers were interested in whether people’s feeling of job security was related to their age. In March 2023 they took a random sample of New Zealanders in employment. The two questions in the survey to help them investigate this were:

A Which of the following best describes your feeling about your job security?

Fully secure, Very secure, Reasonably secure, Worry sometimes, Worry Always

B. What age did you turn on your last birthday?

For Question A: When you design the question using a product like Google forms, you should.

For Question B: When you design the question using a product like Google forms, you should.

The responses to Question B, when entered into a rectangular dataset, will be identified as a variable by iNZight Lite.

Q2b

The responses from these survey questions was grouped by the researchers as follows:

Age_group: Gen Y (37 years-old and under) or Gen X (38- 52 years old).

Security_group: Secure (Fully, Very and Reasonably) or Worried (Sometimes and Always).

These two new variables are summarised in the two-way table of proportions below:

Security_group

Age_group               Secure              Worried               Sample size

GenY                        0.913                0.087                      589

GenX                        0.866                0.134                      365

A 95% confidence interval for the difference in the proportion of all GenY who felt secure and the proportion of all GenX who felt secure was generated. The limits of this confidence interval are (0.005, 0.089).

The point estimate used for the confidence interval is a.

The sample proportion GenX who felt secure was the sample proportion of GenY who felt secure.

The value of the point estimate is %.

The sampling situation used to calculate this interval is.

The margin of error of the confidence interval is %.

If the total sample size was increased the confidence interval would.

of the plausible differences given in the confidence interval are positive.

Therefore we claim that the proportion of all GenY who felt secure was higher than the proportion of all GenX who felt secure.

Question 3

Use the information below to answer the questions in this section.

Researchers were interested in whether people’s sense of job security changed between two time periods. Data was collected from a random sample of 1350 participants in April 2021 and another random sample of 1500 participants in March 2023.

The research question was: Are security and date independent?

Below is some of the iNZight Lite output from the analysis carried out to answer this question, including a Chi-square test for independence.

Q3a

Under the null hypothesis, what proportion of March 2023 respondents would be expected to select fully_secure?   %

Q3b

A suitable null hypothesis for this test would be that security   Date.

There is evidence that the distribution of security depends on Date

At level of significance there is evidence that security is related to Date.

Q3c

According to this data, an estimate of the risk of feeling always worried about job security in April 2021 was (1dp) times the risk of feeling always worried about job security in March 2023.

Q3d

Suppose the P-value for the Chi-square test had been 0.96914.

Which one of the following plots is the most likely distribution of security by date that generated that p-value?

In no more than two sentences explain why you chose this plot. Your answer should compare visual features of the distributions of security in your chosen plot to that of the other plots. As part of your explanation you should consider how compatible the observed data in your chosen plot is with the null model.

Question 4

The NZBN website stores key information for New Zealand businesses, such as the website address and when the business was first registered.

Website addresses, or domain names, can have different extensions such as .co.nz or .com. In Oct 2013 domain names with the extension .nz were introduced.

Q4a

Data from a random sample of businesses, registered after October 2013, from the NZBN website was collected. The mean age, in months_since_registered, for the sample of businesses was 66.52 months.

A Normal Distribution was used to model months_since_registed for NZ businesses registered after October 2013. Below is a visual representation of this model.

A business that had been registered for 123 months would be for this distribution, as they would have a tail proportion of approximately %.

If the model was changed to account for all businesses, including those registered before Oct 2013, the standard deviation would and the middle 95% of business ages under this model would become a interval.

Q4b

Some commentators on internet trends believe that new businesses will prefer to choose the .nz extension and the .co.nz extension will eventually be phased out. To investigate this researchers conducted a two sample t-test, using the following hypotheses:

H0: The underlying mean months_since _registered for businesses with the .co.nz extension is the same as the underlying mean months_since _registered for businesses with the .nz extension

H1: The underlying mean months_since _registered for businesses with the .co.nz extension is not the same as the underlying mean months_since _registered for businesses with the .nz extension

This two sample t-test will have a null/hypothesised value of , and a two-sided alternative hypothesis.

Output from the iNZight inference tab for this test is shown below:

The observed difference for this sample data is (round to 2 d.p.) months, with .co.nz businesses having the sample mean.

We have evidence that businesses with the .nz domain name extension have an underlying mean months_since_registered that is businesses with a .co.nz domain name extension.

Based on the size of the p-value, the 95% confidence interval for the difference between the underlying mean months_since_registered will contain .

Question 5

An e-commerce business was interested in which domain name extension attracted higher spending customers. To explore this, the business bought both the .co.nz and .nz versions of their domain name and duplicated their site at both addresses. For the week after they launched their website they recorded the web traffic and noted the amount of each purchase made by visitors to each of the two domain name versions (site_visited).

A summary of the amount spent for the two domain extensions is shown below:

Q5a

Would the explanatory variable site_visited, with the levels .co.nz and .nz, be best described as a “treatment variable” or a “factor of interest” in this study?

Justify your choice by referring to specific features of the study design.

Q5b

A two sample t-test was carried out using iNZight Lite. The statistical modelling approach used by the software to conduct the hypothesis test and calculate the p-value involved the T distribution.

Below is a visual representation of the relevant T distribution, with the tail proportions shaded and annotated with probabilities related to the t-test statistic.

Use the visualisation shown above to answer the following questions.

The visualisation shows the results of a test.

The alternative hypothesis for the test is that the underlying mean amount spent at the .co.nz version of the site is the underlying mean amount spent at the .nz version of the site.

The p-value for the hypothesis test is which means the observed difference in means statistically significant at the 5% level.

Q5c

Based on the design of the study and the p-value reported (using a 5% level of significance), could a claim be made for these visitors to the website that ‘domain extension causes a change in amount spent'?

Question 6

Researchers were interested in how attractive men of varying levels of baldness were perceived to be. To investigate this 31 men were photographed and three versions of each photo were generated using Photoshop - a shaved head version, a version showing male-pattern-baldness (MPB) and a version with a full head of hair.

Participants in this study were randomly allocated to view one version of each of the 31 men. For the photo they viewed they rated the attractiveness of the man in the photo. Participants were not told the purpose of the study, made aware that the photos had been altered in any way or that there were different versions of the photos. Initially, the researchers who analysed the data were given the data with the photos labelled as A, B, and C so they were not aware which photo matched with which level of baldness.

Ratings were given on a 7-point scale in response to the question Compared to the average man,how attractive is this man?

(1=Much less, 4=Average,7=Much more)

A suitable research question for this study is “Does attractiveness rating change with different levels of baldness?”

Q6a

Use the information about the study to decide if each of the following three statements are TRUE or FALSE.

Blinding was used in this study

A placebo was used in this study

A control group was used in this study

Q6b

The dot plot below shows the attractiveness ratings for the three photo types (baldness levels), with comparison intervals added.

The group has the highest sample mean for attractiveness_rating .

Because the comparison interval based on the mean for the photos with MPB the other groups, we claim that the underlying mean attractiveness rating is the lowest for photos showing MPB when using this modelling approach.

These comparison intervals a hunch that full hair has the highest underlying mean attractiveness rating.

Q6c

Based on the dot plot and summary table above, the variance (spread) of attractivenesss_rating for the three groups appears to be .

An ANOVA (F-test) was carried out to explore the relationship between baldness and attractiveness_rating. Some of the iNZight Lite output from this test is shown below:

Which one of the following probability statements represents the p-value calculation for this F-test?

pr(F < 4.6583)

pr(F > 4.6583)

pr(F = 4.6583)

Based on the p-value, there is evidence that the underlying mean attractiveness_rating depends on baldness.

The difference between the underlying mean attractiveness_rating for the Full_hair

photos and the underlying mean attractiveness_rating for the is significant at the 5% level.

Question 7

Use the information below to answer the questions in this section.

Data on a random sample of 50 pictures of otters was taken from the website Unsplash.com to answer the research question: What is the relationship between number_likes (the number of like the photo has) and number_downloads (the number of times the photo has been downloaded) for pictures of otters on Unsplash.com?

This sample data was then used to carry out the analysis shown below.

Q7a

Noticing that the relationship between number_downloads and number_likes is positive is the only thing we need to do to determine that fitting a linear model is appropriate

There appears to be less variation in number_downloads when number_likes is over 50.

A group of photos of otters that all have 150 likes would be predicted to have a mean number of downloads of approximately 6140.

Q7b

If the photo of the otter with over 150 likes actually had 1000 downloads then the slope of the fitted line would be 40.77 and the linear correlation coefficient would become 0.89.

Q7c

The analysis provided above includes output from iNZight Lite for a test for no association (linear relationship) between number_downloads and number_likes.

Write one sentence answering the research question from above by interpreting the p-value.

Add to this description of the linear relationship by writing another sentence interpreting the relevant 95% confidence interval from above.

Q7d

Below is the dotplot and summary information for the prediction errors for number_downloads

A prediction model was developed to generate prediction intervals for number_of_downloads, using the fitted linear model and a fixed error amount of 2 x RMSE.

The fixed error amount for this prediction model would be approximately .