关键词 > Stat473/573

Stat 473/573, Spring 2021 Practice Questions for Final Exam

发布时间:2023-05-25

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Practice Questions for Final Exam

Stat 473/573, Spring 2021

1. A community contains a total of4000 adult men and 6000 adult women. Suppose we want to estimate the proportion ofadults in the community who prefer butter than margarine. Independent samples ofsize 100 are selected from each group using simple random sampling. Each ofthe 200 respondents is asked whether they prefer butter over margarine. Ofthe men sampled, 40 prefer butter. Ofthe women sampled, 60 prefer butter. Data collection costs are the same for the two groups.

a. What is the design for this problem?  Be specific.

b. Define the variable ofinterest for this problem, i.e. yi .

c. Identify the quantities from the problem and fill in the following table.

Men (h=1)

Women (h = 2)

Population size (Nh )

Sample size (nh )

Sample Proportion ph

d. Estimate the proportion ofadults in the community who prefer butter.

e.   Record the variance formula for the estimator in (d).    (DO NOT CALCULATE)

f.   Calculate the weight for a female in the sample AND provide an interpretation of the weight (in words).

g.   If you will do the same survey next year, which allocation rule would you like to use to     allocate a sample size between the men stratum and the women stratum to obtain the most precise estimate of the overall proportion of adults who prefer butter? WHY? Use the information from this problem to investigate the within-stratum variances.

2.   A public health official is interested in estimating the mean number of cavities per patient for clients attending a free dental clinic. She selects a SRS sample of 10 patients from the 500     patients that have attended the clinic. Each patient’s age (x) and number of cavities (y) are      recorded. The mean age for all 500 clinic patients is 24. The official decides to use the            regression estimator to do the estimation. Part of the regression analysis from R is provided   below.

Estimated Regression Coefficients

Standard

Parameter     Estimate        Error   t Value   Pr > |t|

Intercept     5.075          0.3570     14.19     <.0001

x              0.195          0.0146     13.38     <.0001

a.   Identify the following quantities:

N =

n =

xU =

b.   Based on the R output above, find the estimated intercept and slope of the regression line.

Bˆ =

Bˆ =

c.   Use information from (a - b) to calculate the regression estimate for the mean number of cavities per patient.

d.   Suppose R calculates the mean square of residuals, se 2  = 0.252 , obtain the variance estimator for the regression estimator in part (c).

e.   What is the residual under this model?    Only record the symbol and formula for

patient i.

f.   Based on the results in part (c) and (d), calculate the 99% confidence interval for the mean number of cavities per patient.

g.   Is the regression estimator biased or unbiased?    Check one option.

________       Unbiased ;                  ________        Biased

3.   A company has 130 districts. A SRSWOR sample of 10 districts is selected from the 130       districts, and each sampled district provides data on last year’s sales.    See Table 1 below and the unit is $100. Now suppose you want to estimate the total sales for all districts that had      sales larger than 10 in the last year (i.e. larger than $1000).

a.   Define the domain Ud of interest in words.

b.   Define the domain population parameter that you want to estimate. Be specific about your mathematical notations, including the definition of yi .

c.   For the domain estimation, define the following variables:

(

|

ui =

|

(

|

xi =

|

d.   Fill in the columns for variable ui and xi in the following Table 1. Table 1: Data set for the 10 selected districts

Selected districts

Last Year Sales ( yi) Unit is $100

xi

ui

1

6

2

7

3

15

4

11

5

6

6

9

7

9

8

12

9

13

10

17

e.   Estimate the total sales for all districts that had sales larger than 10 in the last year.

f.   Assume R outputs the sample variance of variable u in Table 1 as

s = 54 .

Calculate the variance for your estimator in (e).

.

Calculate the 95% confidence interval for the total sales for all districts that has sold more than 10 last year.

4.   A population consists of 16 grocery stores located in 4 towns.    These grocery stores employ high school students.    The number of student employees (yij) for each store (j) in each town

(i) is listed below.

Town A

Town B

Town C

Town D

Store

Number of

Student

Employees

Store

Number of

Student

Employees

Store

Number of

Student

Employees

Store

Number of

Student

Employees

1

2

3

4

2

3

4

3

1

2

2

0

1

2

3

4

5

6

4

3

5

2

4

6

1

2

3

4

2

1

3

4

Suppose the population is sampled using the following design: 2 towns are selected using SRSWOR design, and then within each sampled town, a SRSWOR sample of one-half the grocery stores is selected. Use this information to answer the following questions.

a.   What is the design? Be specific.

b.   Define the following notations: Cluster (in words):

Element (in words):

n (in number) =

N (in number) =

Mi (in words):

mi (in words):

K (in number) =

c.  What is the probability of including Town B in the sample, B?

Given that Town B is selected, what is the probability of including Store 2 in the sample for Town B, i.e. what is 2|B?

What is the joint probability of including Store 2 of Town B in the sample, i.e. what is

B2?

d. Do all ofthe grocery stores in this population have the same probability ofbeing       including in the sample using this procedure?  Put another way, is this a self-weighting design?

_______ Yes                 _______ No

Explain why.

e. Suppose that the sample selected for analysis consisted of Stores 2 and 4 in Town A

Stores 1, 2, and 4 in Town C

(i). What are estimated totals for town A and C, i.e. what are tA and tC ?

(ii). Estimate the total number ofstudents employed part time by all grocery stores, using the unbiased estimator.

(iii). Calculate the variance of the unbiased estimator in (ii).

(iv). Estimate the total number of students employed part time by all grocery stores, using the ratio estimator.

(v). Record the variance formula of the ratio estimator in (iv). Do NOT calculate. Be specific about your mathematical notations.

5.   Another researcher used a different design to study the same population.    His sample plan  was the following: two towns are selected using PPSWR design, where the number of         grocery stores each town contains is the size variable; and then within each sampled town, a SRSWOR of 2 grocery stores is selected. The table shown in Problem 2 is repeated below:

Town A

Town B

Town C

Town D

Store

Student Employees

Store

Student Employees

Store

Student Employees

Store

Student Employees

1

2

3

4

2

3

4

3

1

2

2

0

1

2

3

4

5

6

4

3

5

2

4

6

1

2

3

4

2

1

3

4

a.   What is the design? Be specific.

b.   What is the weight for grocery store 2 in Town B, i.e. what is wB2 ?

c.   Is this a self-weighting design?    Why or why not?

d. Suppose that the sample consisted of: Stores 2 and 4 in Town A Stores 1 and 2 in Town C

(i) What are the selection probabilities for town A and C, i.e. what areA andC ?

(ii) What are the estimated cluster total for town A and C, i.e. what are tA and tC ?

(iii) Estimate the total number ofstudents employed part time by all grocery stores, ?

(iv) Calculate the variance ofestimator in (iii).

6.   Consider the population of residents living in a city.    Households in the city are organized into square blocks with about 25 households per block.    A market researcher wishes to      estimate the total number of cars owned by households in the city.    Two possible sampling approaches have been suggested to the researcher.

Stratified random sampling of households, where

▪   strata are defined to be regions of the city that vary by household income