关键词 > Stat473/573
Stat 473/573, Spring 2021 Practice Questions for Final Exam
发布时间:2023-05-25
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Practice Questions for Final Exam
Stat 473/573, Spring 2021
1. A community contains a total of4000 adult men and 6000 adult women. Suppose we want to estimate the proportion ofadults in the community who prefer butter than margarine. Independent samples ofsize 100 are selected from each group using simple random sampling. Each ofthe 200 respondents is asked whether they prefer butter over margarine. Ofthe men sampled, 40 prefer butter. Ofthe women sampled, 60 prefer butter. Data collection costs are the same for the two groups.
a. What is the design for this problem? Be specific.
b. Define the variable ofinterest for this problem, i.e. yi .
c. Identify the quantities from the problem and fill in the following table.
|
Men (h=1) |
Women (h = 2) |
Population size (Nh ) |
|
|
Sample size (nh ) |
|
|
Sample Proportion p八h |
|
|
d. Estimate the proportion ofadults in the community who prefer butter.
e. Record the variance formula for the estimator in (d). (DO NOT CALCULATE)
f. Calculate the weight for a female in the sample AND provide an interpretation of the weight (in words).
g. If you will do the same survey next year, which allocation rule would you like to use to allocate a sample size between the men stratum and the women stratum to obtain the most precise estimate of the overall proportion of adults who prefer butter? WHY? Use the information from this problem to investigate the within-stratum variances.
2. A public health official is interested in estimating the mean number of cavities per patient for clients attending a free dental clinic. She selects a SRS sample of 10 patients from the 500 patients that have attended the clinic. Each patient’s age (x) and number of cavities (y) are recorded. The mean age for all 500 clinic patients is 24. The official decides to use the regression estimator to do the estimation. Part of the regression analysis from R is provided below.
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 5.075 0.3570 14.19 <.0001
x 0.195 0.0146 13.38 <.0001
a. Identify the following quantities:
N =
n =
xU =
b. Based on the R output above, find the estimated intercept and slope of the regression line.
Bˆ =
Bˆ =
c. Use information from (a - b) to calculate the regression estimate for the mean number of cavities per patient.
d. Suppose R calculates the mean square of residuals, se 2 = 0.252 , obtain the variance estimator for the regression estimator in part (c).
e. What is the residual under this model? Only record the symbol and formula for
patient i.
f. Based on the results in part (c) and (d), calculate the 99% confidence interval for the mean number of cavities per patient.
g. Is the regression estimator biased or unbiased? Check one option.
________ Unbiased ; ________ Biased
3. A company has 130 districts. A SRSWOR sample of 10 districts is selected from the 130 districts, and each sampled district provides data on last year’s sales. See Table 1 below and the unit is $100. Now suppose you want to estimate the total sales for all districts that had sales larger than 10 in the last year (i.e. larger than $1000).
a. Define the domain Ud of interest in words.
b. Define the domain population parameter that you want to estimate. Be specific about your mathematical notations, including the definition of yi .
c. For the domain estimation, define the following variables:
(
|
ui =〈
|
(
|
xi =〈
|
d. Fill in the columns for variable ui and xi in the following Table 1. Table 1: Data set for the 10 selected districts
Selected districts |
Last Year Sales ( yi) Unit is $100 |
xi |
ui |
1 |
6 |
|
|
2 |
7 |
|
|
3 |
15 |
|
|
4 |
11 |
|
|
5 |
6 |
|
|
6 |
9 |
|
|
7 |
9 |
|
|
8 |
12 |
|
|
9 |
13 |
|
|
10 |
17 |
|
|
e. Estimate the total sales for all districts that had sales larger than 10 in the last year.
f. Assume R outputs the sample variance of variable u in Table 1 as
s = 54 .
Calculate the variance for your estimator in (e).
.
Calculate the 95% confidence interval for the total sales for all districts that has sold more than 10 last year.
4. A population consists of 16 grocery stores located in 4 towns. These grocery stores employ high school students. The number of student employees (yij) for each store (j) in each town
(i) is listed below.
Town A |
Town B |
Town C |
Town D |
||||
Store |
Number of Student Employees |
Store |
Number of Student Employees |
Store |
Number of Student Employees |
Store |
Number of Student Employees |
1 2 3 4 |
2 3 4 3 |
1 2 |
2 0 |
1 2 3 4 5 6 |
4 3 5 2 4 6 |
1 2 3 4 |
2 1 3 4 |
Suppose the population is sampled using the following design: 2 towns are selected using SRSWOR design, and then within each sampled town, a SRSWOR sample of one-half the grocery stores is selected. Use this information to answer the following questions.
a. What is the design? Be specific.
b. Define the following notations: Cluster (in words):
Element (in words):
n (in number) =
N (in number) =
Mi (in words):
mi (in words):
K (in number) =
c. What is the probability of including Town B in the sample, 冗B?
Given that Town B is selected, what is the probability of including Store 2 in the sample for Town B, i.e. what is 冗2|B?
What is the joint probability of including Store 2 of Town B in the sample, i.e. what is
冗B2?
d. Do all ofthe grocery stores in this population have the same probability ofbeing including in the sample using this procedure? Put another way, is this a self-weighting design?
_______ Yes _______ No
Explain why.
e. Suppose that the sample selected for analysis consisted of Stores 2 and 4 in Town A
Stores 1, 2, and 4 in Town C
(i). What are estimated totals for town A and C, i.e. what are t八A and t八C ?
(ii). Estimate the total number ofstudents employed part time by all grocery stores, using the unbiased estimator.
(iii). Calculate the variance of the unbiased estimator in (ii).
(iv). Estimate the total number of students employed part time by all grocery stores, using the ratio estimator.
(v). Record the variance formula of the ratio estimator in (iv). Do NOT calculate. Be specific about your mathematical notations.
5. Another researcher used a different design to study the same population. His sample plan was the following: two towns are selected using PPSWR design, where the number of grocery stores each town contains is the size variable; and then within each sampled town, a SRSWOR of 2 grocery stores is selected. The table shown in Problem 2 is repeated below:
Town A |
Town B |
Town C |
Town D |
||||
Store |
Student Employees |
Store |
Student Employees |
Store |
Student Employees |
Store |
Student Employees |
1 2 3 4 |
2 3 4 3 |
1 2 |
2 0 |
1 2 3 4 5 6 |
4 3 5 2 4 6 |
1 2 3 4 |
2 1 3 4 |
a. What is the design? Be specific.
b. What is the weight for grocery store 2 in Town B, i.e. what is wB2 ?
c. Is this a self-weighting design? Why or why not?
d. Suppose that the sample consisted of: Stores 2 and 4 in Town A Stores 1 and 2 in Town C
(i) What are the selection probabilities for town A and C, i.e. what areA and
C ?
(ii) What are the estimated cluster total for town A and C, i.e. what are t八A and t八C ?
(iii) Estimate the total number ofstudents employed part time by all grocery stores, ?
(iv) Calculate the variance ofestimator in (iii).
6. Consider the population of residents living in a city. Households in the city are organized into square blocks with about 25 households per block. A market researcher wishes to estimate the total number of cars owned by households in the city. Two possible sampling approaches have been suggested to the researcher.
Stratified random sampling of households, where
▪ strata are defined to be regions of the city that vary by household income