关键词 > STATS101/108

STATS 101/108 - Past exam S223

发布时间:2024-06-13

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 101/108 - Past exam S223

Question 1

Use the information below to answer the questions in this section.

The NZBN (New Zealand Business Number) is a globally unique identifier for NZ businesses and is used by various individuals and agencies to access core business information.

The NZBN register is available on the website nzbn.govt.nz and provides a search tool to look up key information about New Zealand businesses, including:

their NZBN,

official business name,

register status (registered, in liquidation, inactive, removed),

the year the business was registered.

To access information about a business requires first searching for a business using words in their name (or a specific NZBN), then clicking the link for each business given in the search results to open up a separate page about that business.

A researcher searched for businesses with the word “data” in their name, and the website returned a total of 2088 different businesses. The researcher then took a random sample of 150 of these “data businesses” to analyse.

Similarly, the researcher searched for businesses with the word “science” in their name, and the website returned a total of 848 different businesses. After removing businesses that also had the word “data” in their name, the researcher then took a random sample of 150 of these “science businesses” to analyse.

The researcher wanted to estimate the mean number of characters used in the names of NZ “data businesses” and the mean number of characters used in the names of NZ “science businesses” .

They created the variable num_chars_business_name by using the function =len() within Google Sheets to count the number of characters used in each business name.The researcher then produced the plots below as part of their investigation.

Q1a

A research question that asks - What is the mean number of characters used in the names of NZ “data businesses” in the researcher's sample? - would require a sample-to-population inference.

Q1b

The mean of num_chars_business_name for the sample of the NZ “data businesses” is approximately.

Q1c

The standard deviation of num_chars_business_name for the sample of NZ “science businesses” is approximately.

Q1d

If the researcher constructed VIT bootstrap confidence intervals for the mean num_chars_business_name, the confidence interval based on the sample of NZ “data businesses” will be the confidence interval based on the sample of NZ “science businesses”.

Q1e

The researcher also wanted to estimate the proportion of NZ “data businesses” that had a removed register status and the proportion of NZ “science businesses” that had a removed register status.

They found that 95 of the 150 NZ “data businesses” had a removed register status and 75 of the 150 NZ “science businesses” had a removed register status.

If the researcher constructed VIT bootstrap confidence intervals for the proportion of businesses with a removed register status (one for “data businesses” and one for “science businesses”), the confidence interval based on the sample of NZ “data businesses” will be.

Question 2

Use the information below to answer the questions in this section.

For this question, we will focus on Tatauranga umanga Maori – Statistics on Maori businesses. Maori businesses are those that are owned by a person or people who have Maori whakapapa, and a representative of that business identifies the business as Maori.

Tatauranga Aotearoa (StatsNZ) carries out annual business operations surveys to collect information about business practices for the last financial year. For the 2022 survey, a sample of 1290 Maori authorities and a sample of 2640 other Maori enterprises were selected from the larger population of Maori businesses.

Q2a

One of the questions used in the business operations survey was: Does this business have a social media presence?

If you were to design this survey question using a product like Google Forms, you should:

provide a list of options, of which more than one can be selected by participants.

provide a text box that will only accept a number.

provide a list of options, of which only one can be selected by participants.

provide a text box for participants to complete their answers however they want.

provide a slider for participants to select their answer on a scale from 0 to 100.

Q2b

Based on the responses to this survey question by Maori authorities, a 95% confidence interval was constructed to estimate the proportion of all Maori authorities that have a social media presence. The limits of this confidence interval are (0.624, 0.676).

% (round your answer to 1 d.p.) of the Maori authorities surveyed said that they have a social media presence.

Q2c

To compare the proportion of Maori authorities that have a social media presence with the proportion of other Maori enterprises that have a social media presence, a 95% confidence interval was constructed for the difference of two proportions using the order Maori authorities - other Maori enterprises. The limits of this confidence interval are (-0.141, -0.079).

Based on this confidence interval, we claim that the proportion of all Maori authorities that have a social media presence is lower than the proportion of all other Maori enterprises that have a social media presence.

Q2d

Another question in the business operations survey asked: For which of the following reasons does this business have a social media presence? Select all that apply.

51% of the Maori authorities surveyed selected “Gathering feedback or resolving customer concerns” and 46% of the Maori authorities surveyed selected “Monitoring social media activity of other businesses or customers”.

The confidence interval calculator was used by two different students (Student A, Student B) to calculate a 95% confidence interval for the difference between the proportion of all Maori authorities who use social media to gather feedback or resolve customer concerns and the proportion of all Maori authorities who use social media to monitor social media activity of other businesses or customers.

Clearly identify which student has chosen the correct sampling situation by stating Student A or Student B.

Then use the confidence interval calculated for this sampling situation chosen by this student to answer the following research question: How does the proportion of all Maori authorities who use social media to gather feedback or resolve customer concerns compare to the proportion of all Maori authorities who use social media to monitor social media activity of other businesses or customers?

Toi Hangarau is a report on Maori-owned technology companies that was initiated and led by Maori in 2023. For the report, 72 Maori-owned technology companies were selected from public sources including the New Zealand Companies Office.

From the data collected, a 95% confidence interval was constructed to estimate the mean age of Maori-owned technology companies. The limits of this confidence interval are (7.4, 11.2).

Use this information to answer the following questions.

Q2e

The margin of error of the confidence interval is (round your answer to 1 d.p.).

Q2f

If the sample size was increased, but the sample mean and standard deviation stayed the same, the confidence interval would.

Question 3

Use the information below to answer the questions in this section.

Recall that in Section 1, a researcher obtained a random sample of 150 “data businesses” and 150 “science businesses” from the NZBN (New Zealand Business Number) registry.

For each of the businesses in both samples that were currently registered (i.e., not removed, in liquidation or inactive), the researcher looked up the date of the registration and recorded how many years since the business had been registered.

For example, if the business was registered in 2023, the value for years_since_registered was recorded as 0, and if the business was registered in 2020, the value for years_since_registered was recorded as 3.

A researcher was interested in confirming that, for all currently registered businesses, the mean years since a “data business” was registered is either higher or lower than the mean years since a “science business” was registered.

They decided to carry out a two-sample t-test using iNZight Lite. Below is part of the output produced.

Q3a

Which of the following is a suitable two-sided alternative hypothesis for this test?

The underlying mean years since a “data business” was registered is not the same as the underlying mean years since a “science business” was registered.

The underlying mean years since a “data business” was registered is the same as the underlying mean years since a “science business” was registered.

The underlying mean years since a “data business” was registered is higher than the underlying mean years since a “science business” was registered.

Q3b

This test has a null/hypothesised value of.

Q3c

The observed difference for the sample data is (round to 3 d.p.).

Q3d

There is evidence that for all currently registered businesses, the mean years since a “data business” was registered is the mean years since a “science business” was registered.

Q3e

The researcher then carried out a similar sampling and inference approach but focused on comparing “human businesses” (businesses with the word “human” in their name) and “psychology businesses” (businesses with the word “psychology” in their name).

Compare the two different statements shown below, in terms of the strength of evidence and the provided confidence interval. One of these statements is incorrect and one is correct.

Clearly identify which one of the two statements is incorrect, by stating Statement A or Statement B.

Then, in no more than two sentences, explain why this statement is incorrect when you compare it to the other statement which is correct.

Question 4

Use the information below to answer the questions in this section.

Within the first few seconds of meeting someone, we make judgements about their character traits. During COVID-19, many of these first impressions happened online during ZOOM meetings.

A study was carried out to investigate the potential influence of how people present themselves during ZOOM meetings, including the facial expressions and ZOOM backgrounds they use, on the first impressions made of trustworthiness and competence.

The first phase of the study involved 160 undergraduate students, who volunteered to be in the study in exchange for a Countdown supermarket voucher. Two groups were compared: a group that was shown an image of an AI-generated person with a happy facial expression, and a group that was shown an image of an AI-generated person with a more neutral facial expression.

These groups were determined by allocating participants to a group randomly at the beginning of the study. For both groups, the AI-generated person was framed within a ZOOM border to simulate meeting them online in ZOOM. Neither group was told that the person in the image was AI-generated.

After viewing the image, participants were asked to rate the person’s trustworthiness on a scale of 0 to 100 points.

The researchers conducted a two sample t-test, using the following hypotheses:

H0: The underlying mean trustworthiness rating for the happy face is the same as the underlying mean trustworthiness rating for the more neutral face

H1: The underlying mean trustworthiness rating for the happy face is not the same as the underlying mean trustworthiness rating for the neutral face

The researchers found a difference of 1.3 points between the mean trustworthiness rating for each group (p-value = 0.0243), with the group that were shown an image of an AI-generated person with a happy face having the higher mean.

Q4a

The explanatory variable for the study is best described as a

Q4b

Which one of the following is the correct formal expression of the null hypothesis?

μ1 – μ2 ≠ 0

μ1 – μ2 = 0

x̄1 – x̄2 ≠ 0

x̄1 – x̄2 = 0

Q4c

Which one of the following is a visual representation of the relevant T distribution, with any tail proportions shaded and annotated with probabilities related to the t-test statistic?

Q4d

The observed difference of 1.3 points statistically significant at the 10% level.

Q4e

Based on the design of the study and the p-value reported (using a 5% level of significance), a claim be made for these undergraduate students that 'the type of facial expression causes a change in trustworthiness ratings'.

Question 5

Use the information below to answer the questions in this section.

Recall that a study was carried out to investigate the potential influence of how people present themselves during ZOOM meetings, including the facial expressions and ZOOM backgrounds they use, on the first impressions made of trustworthiness and competence.

The second phase of the study involved 200 undergraduate students, who volunteered to be in the study in exchange for a Countdown supermarket voucher. In this phase of the study, four different images were used.

The same AI-generated person with a happy face was framed within a ZOOM border for each image, but the background changed and was one of four different versions: a blank background (no background), a background showing plants, a background showing a bookcase, or a novelty background showing a galaxy.

Participants were randomly allocated to be shown one of the versions of the image and asked to rate the person’s competency on a scale of 0 to 100 points. Initially, the researchers who analysed the data were not aware of the name of the ZOOM background used and were given the data with the backgrounds labelled as A, B, C, and D.

An ANOVA (F-test) was carried out to explore the relationship between background_treatment and competence_rating. Some of the iNZight Lite output from this test is shown below:

Q5a

Which one of the following is NOT a feature of the study design used?

Random allocation was used.

Blinding was used.

Blocking was used.

A control group was used.

Q5b

The ratio of the largest sample standard deviation to the smallest standard deviation for the groups is (round your answer to 1 d.p.).

Q5c

Which one of the following probability statements represents the p-value calculation for this F-test?

pr(F < 17.895)

pr(F = 17.895)

pr(F ≠ 17.895)

pr(F > 17.895)

Q5d

How many of the pairwise differences are statistically significant at the 5% level?

Q5e

Which one of the following is an incorrect interpretation of the results?

There is evidence that competency ratings change with the ZOOM background used.

With 95% confidence, the underlying mean competency rating for the blank background is somewhere between 0.9 lower and 8.1 higher than the underlying mean competency rating for the novelty background.

We can claim that the underlying mean competency rating is the lowest for the novelty background.

With 95% confidence, the underlying mean competency rating for the novelty background is somewhere between 3.9 and 11.1 lower than the underlying mean competency rating for the plants background.

Question 6

Q6a

Data collected by Tatauranga Aotearoa (StatsNZ) was used to estimate the risks of NZ businesses from different industries failing (closing down) within the four years of operation.

The two-way table of counts below provides data comparing business failure within four years by two types of industry, for NZ businesses that opened in 2017.

Use the information provided above to complete the following statement, rounding your answer to 1 d.p.

For NZ businesses that opened in 2017, information media and telecommunications businesses were about times as likely to fail within four years as agriculture, forestry and fishing businesses.

Use the information below to answer the following questions.

Ipurangi Aotearoa (InternetNZ) is a company that manages the registrations of .nz domain names (websites with URLs that end in .nz, such as health.govt.nz or auckland.ac.nz).

In 2020 and 2022, Ipurangi Aotearoa conducted online surveys of NZ businesses and consumers to learn more about their thoughts around different types of domain names.

Businesses were selected for the surveys so that they were representative in terms of geographic region and business size, and consumers were selected for the survey so that they were representative of the New Zealand adult population in terms of age, gender, ethnicity and location.

Across both surveys, a total of 1076 businesses responded and a total of 1125 consumers responded.

One of the questions in the survey asked participants to select the extent of their agreement that the .nz domain name is more trustworthy than other domain names, using one option from: Strongly disagree, Disagree, Unsure,

Agree, or Strongly agree. The answers to this question were used to create a variable called extent_agreement_more_trustworthy.

Using the responses from businesses for the 2020 and 2022 surveys, Researcher A carried out a Chi-square test for independence using the variables year (the survey year) and extent_agreement_more_trustworthy.

Below is some of the iNZight Lite output from their analysis:

Use the information provided about the surveys conducted by Ipurangi Aotearoa (InternetNZ), including the output from Researcher A’s analysis, to answer the following questions.

Q6b

The level of extent_agreement_more_trustworthy for which there is the biggest difference between the sample proportions for the two different survey years is.

Q6c

When Researcher A carried out the Chi-square test for independence, they would have used as the response variable (first variable selected in iNZight Lite).

Q6d

Under the null hypothesis, we would expect % (round to 1 d.p.) of respondents from the 2020 survey to select Agree.

Q6e

There is evidence that extent_agreement_more_trustworthy is dependent on year.

Q6f

Using the responses from consumers for the 2020 and 2022 surveys, Researcher B carried out a Chi-square test for independence using the variables year (the survey year) and extent_agreement_more_trustworthy.

Below is some of the iNZight Lite output from their analysis:

In no more than two sentences, explain whether the p-value for Researcher B’s hypothesis test would be larger or smaller than the p-value for Researcher A’s hypothesis test. It might help you to consider as part of your explanation how compatible the observed data is with the null model.

Question 7

Please refer to the information below for the questions in this section.

Various studies have found that the slower the speed at which a web page loads, the more likely that someone will “bounce” and leave the page.

Google has also made it public that page speed is an important factor in determining where a page is ranked in Google’s search results (the faster the better).

A tool called the Google PageSpeed Insights API can be used programmatically to obtain page speed-related data for a given URL/website.

Some of the variables available from the API include:

performance_score: A score between 0 and 100, where high scores indicate: faster page load speeds, use of best practices for web performance, and smoother user experiences.

speed_score: A measure of how long it takes, in seconds, for the contents of a web page to be visually displayed to the user.

To explore the speed and performance of pages on the University of Auckland website, a random sample of 90 pages was taken from all 12747 pages available.

For each web page in the sample, the URL of the page was recorded and used to obtain the performance_score and speed_score from the Google PageSpeed Insights API.

This sample data was then used to carry out the analysis shown below.

Q7a

The variables performance_score and speed_score have a linear correlation coefficient of around.

Q7b

The equation of the fitted line is: performance_score = 97.46 * speed_score

Q7c

Using the linear model fitted, an individual web page that has a speed_score of 3 and an actual performance_score of 60.2, would have a predicted performance_score of around and a residual/prediction error of around (round to 1 d.p.).

Q7d

With 95% confidence, on average, each 2 second difference in speed_score is associated with a decrease in performance_score of .

Q7e

The analysis provided includes output from iNZight Lite for a test for no association (linear relationship) between performance_score and speed_score.

In no more than two sentences, explain why it doesn’t really make sense to conduct this test, making sure you specifically refer to information provided about the data/variables and the null hypothesis for this test.