Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

LAB 2 ASSIGNMENT

SAMPLING DISTRIBUTIONS, CENTRAL LIMIT THEOREM

In this lab assignment, you will explore important properties of the sampling distribution of a sample mean in the context of a filling process. In particular, you will use some sampling procedures in R (or R commander) to demonstrate the validity of the Central Limit Theorem. You will see that the distribution of the sample mean for samples drawn from a highly skewed distribution becomes approximately normal as the sample size increases. Moreover, you will investigate how the spread of the sampling distribution of the sample mean is affected by sample size.

Examining a Filling Process

These days, juice box dispensing juice (such as apple juice) is performed by filling machines. These are set to deliver a certain amount of juice, which we will call the target amount, and the contents of juice boxes will vary around this mean value. The amount of variation will depend on the efficiency of the machine itself as well as certain properties of the juice, such as its density. The manufacturer may be able to reduce this variation, but no amount of expertise or effort could lead to its complete removal.

A company uses a filling machine to fill usual boxes with an apple juice. The boxes are supposed to contain 130 milliliters (approximately 4.4 oz) of the drink. However, when buying a box of juice which bears a stamp claiming that the amount of the drink is 130 milliliters (ml), will there be exactly 130 ml of juice?

Probably some amount close but not exactly equal to 130 ml.

If the amount of juice dispensed by the filling machine follows a symmetric distribution and the mean target value is set equal to the claimed amount of 130 ml, half of the juice boxes would be underfilled and half would be overfilled. This may seem perfectly reasonable to the manufacturer but consumers may feel differently, particularly if they happen to buy the underfilled juice boxes. To make the customer happy, the manufacturer may decide to overfill the juice boxes slightly so that the target fill of the machine is more than the claimed amount. However, even a small increase in the target fill represents a loss of profit to the manufacturer.

The juice boxes are shipped in packages containing either 10 or 35 juice boxes. How does the amount of juice vary from juice box to juice box? How does the average amount of juice vary from package to package containing the same number of juice boxes? How does the number of juice boxes in a package affect the distribution of the means? You will obtain the answers to all these questions in this lab.

Answer the following questions:

1. Suppose the amount of apple juice dispensed by a filling machine follows a normal distribution with a mean (μ) and a standard deviation (ó). Select the Distributions option in the R commander menu and then the Normal distribution among continuous distributions options. This allows you to obtain a graph of the normal density function, and to calculate normal probabilities when the parameters (μ and ó) are provided. Use R commander to answer the following questions:

(a) Assume that the mean amount dispensed by the machine is set at μ = 130 ml. Enter the value of ó as 5, then 10, then 15, and eventually 20 ml. After each entry, carefully examine the shape of the corresponding density curve. You are not supposed to print the density curves. Describe briefly the change in the appearance of the percentage of underfilled juice boxes (the juice boxes containing less than 130 ml) when σ decreases or increases? In general, how does the magnitude of the standard deviation affect the filling process?

(b) Now assume that the mean amount dispensed by the machine is set at μ = 135 ml. Enter the value of ó as 15 ml. Calculate the percentage of underfilled juice boxes (the juice boxes containing less than 130 ml) in this case. What is the percentage of underfilled juice boxes if ó were 10 ml and 5 ml? In general, what is the effect of decreasing ó on the percentage of underfilled juice boxes?

(c) Now set the standard deviation to 5 ml and change the mean. Enter the value of µ as 130, then 135, and eventually 140 ml. Calculate the percentage of underfilled juice boxes in each case. Describe briefly how the shape of the corresponding curve changes. How does changing the value of µ affect the filling process? Does the percentage of underfilled boxes increase or decrease? Do not print the density curves.

2. Consider a random sample of 500 juice boxes obtained from the population of all juice boxes filled by the machine over a specific short time period. The volume amount of apple juice in each juice box is determined. The 500 observations recorded in the first column volume are available in the data file Lab2-Data.txt in eClass. Given the very large sample size, we may assume that the distribution of the volume amount of apple juice in the sample (data file) is close enough to the population distribution while its mean and standard deviation are close to the population parameters (μ and σ).

(a) Obtain a frequency histogram of the 500 observations with the bins starting at 125, ending at 155, and using a width of 5. (Hint: R assumes that the right endpoint of each interval is included. Your histogram should include the left endpoints.) Paste the histogram into your report. The format of the histogram should be the same as the format of the histogram in the Lab 1 Instructions (labels at the axes, title).

(b) Describe the shape of the histogram obtained in part (a). Does the histogram support the claim of the company that the juice boxes are slightly overfilled?

(c) Obtain a Q-Q plot and a boxplot for the 500 observations. Add a title to each plot. Paste both plots into your report. (TIP: Click “Options” and select Outliers “(Interactively) with mouse” when you make the boxplot in R commander to see to which observation the outlier corresponds.) Is (are) there any outlier(s)? Do the plots confirm your findings in part (b) about the shape of the distribution?

(d) Obtain the summary statistics (mean, standard deviation, IQR, min, Q1, median, Q3, max, and n) of the 500 observations. Paste the summaries into your report. Briefly describe the relationship between the mean and median, as well as the relationship between the three quartiles. Are the relationships consistent with the observed shape of the histogram in part (b)?

Suppose that 100 packages are randomly selected, each consisting of 10 juice boxes of apple juice obtained from the population of all juice boxes filled over a certain short time period. The amount of apple juice in each juice box is determined. The measurements are saved in a table consisting of 10 rows (sample size) and 100 columns (number of random samples) that occupies the columns Sample1 – Sample100 in the lab2-Q3.txt file.

3. Obtain the mean amount of apple juice for each sample consisting of 10 juice boxes. Make sure that all 100 columns are included in the panel of the “Numerical Summaries” dialog box.

(a) Obtain a frequency histogram of the 100 means with the bins starting at 132, ending at 141, and using a width of 1. (Hint: R assumes that the right endpoint of each interval is included. Your histogram should include the left endpoints.) Paste the histogram into your report. The format of the histogram should be the same as the format of the histogram in Lab 1 Instructions (labels at the axes, title).

(b) Refer to the histogram obtained in part (a). Does the data appear to be normally distributed? Compare the distribution of the means to the distribution of individual observations studied in Question 2 in terms of their spread and degree of skewness.

(c) Obtain a Q-Q plot and a boxplot for the 100 means. Add a title to each plot. Paste both plots into your report. Is (are) there any outlier(s)? Do the plots confirm your findings in part (b)? Compare the plots with the ones in part (c) of Question 2.

(d) Obtain the sample size, mean, and standard deviation of the 100 means. Paste the summaries into your report. Compare the values with the mean and the standard deviation of the sampling distribution of the sample mean predicted by the theory of sampling distributions. What does the standard deviation mean here?

Now suppose 100 packages are randomly selected, each consisting of 35 juice boxes of apple juice obtained from the population of all juice boxes filled over the same short time period. The amount of apple juice in each juice box is determined and the measurements are saved in the lab-Q4.txt file in the form of a table of 100 columns, each consisting of 35 rows.

4. Obtain the mean amount of apple juice for each sample consisting of 35 juice boxes. Make sure that all 100 columns are included in the panel of the “Numerical Summaries” dialog box.

(a) Obtain a frequency histogram of the 100 means with the bins starting at 134, ending at 139, and using a width of 0.5. Paste the histogram into your report. (Hint: R assumes that the right endpoint of each interval is included. Your histogram should include the left endpoints.) The format of the histogram should be the same as the format of the histogram in Lab 1 Instructions (labels at the axes, title).

(b) Describe the shape of the histogram in part (a). Does the data appear to be normally distributed? Compare the histogram with the histogram obtained in part (a) of Question 2 and the one in part (a) of Question 3. In particular, comment about differences in spread and degree of skewness between each pair of histograms.

(c) Obtain a Q-Q plot and a boxplot for the 100 means. Add a title to each plot. Paste both plots into your report. Is (are) there any outlier(s)? Do the plots confirm that the sample means indicate a normal distribution? Explain. Compare the Q-Q plot and boxplot with the Q-Q plots and boxplots obtained in part (c) of Questions 2 and 3. What do you conclude?

(d) Obtain the sample size, mean, and standard deviation of the 100 means. Paste the summaries into your report. Compare the value of the standard deviation of the sample mean for n = 35 with the standard deviation of the sample mean in part (d) of Question 3 (for n = 10). Compare the values with the mean and the standard deviation of the sampling distribution of the sample mean predicted by the theory of sampling distributions. Which sample mean tends to be a more accurate estimate of the population mean?

Proper Title Page (Using Lab Assignment Template on eClass): 5 points

Appearance: 5 points (1 bonus point for each question submitted properly on eClass)

Note: Lab assignments must be typed and submitted on eClass. A handwritten assignment is not acceptable and it will receive a mark of zero for the whole assignment.

Question 1 (20)

(a) Percentage of underfilled juice boxes when the standard deviation decreases or increases: 2 points How the magnitude of the standard deviation affects the filling process: 2 points

(b) Percentage of underfilled juice boxes when μ = 135 and σ = 15 ml: 2 points Percentage of underfilled juice boxes when μ = 135 and σ = 10 ml: 2 points Percentage of underfilled juice boxes when μ = 135 and σ = 5 ml: 2 points

Effect of decreasing σ on the percentage of underfilled juice boxes: 2 points

(c) Percentage of underfilled juice boxes when μ = 130 and σ = 5 ml: 2 points Percentage of underfilled juice boxes when μ = 135 and σ = 5 ml: 2 points Percentage of underfilled juice boxes when μ = 140 and σ = 5 ml: 2 points

Effect of increasing μ on the percentage of underfilled juice boxes: 2 points

Question 2 (30)

(a) Properly formatted histogram of the 500 observations: 4 points

(b) Shape of the histogram in part (a): 2 points

Conclusion about histogram support of company’s claim: 2 points

Outliers: 4 points (2 points for each plot) Consistency with the conclusions in part (b): 2 points

(d) Summary statistics output: 2 points Relationship between mean and median: 2 points Relationship among the three quartiles: 2 points Consistency with the conclusions in part (b): 2 points

Question 3 (35)

(a) Properly formatted histogram of the 100 sample means (n = 10): 4 points

(b) Shape of the histogram in part (a), normality: 2 points

Comparison with parent distribution (spread, degree of skewness): 4 points (2 points each feature)

Outliers: 4 points (2 points for each plot) Comparison with conclusions in part (b): 2 points Comparison with plots in Question 2: 2 points

(d) Summary statistics output: 3 points

Comparison with the values predicted by theory: 4 points (2 points for mean and 2 points for sd) Standard deviation: 2 points

Question 4 (40)

(a) Properly formatted histogram of the 100 sample means (n = 35): 4 points

(b) Shape of the histogram in part (a), normality: 2 points

Comparison with graph from Question 2 (spread, skewness): 4 points (2 points for each feature) Comparison with graph from Question 3 (spread, skewness): 4 points (2 points for each feature)

Outliers: 4 points (2 points for each plot) Comparison with conclusions in part (b): 2 points

Comparison with plots in Question 2 and conclusion: 2 points Comparison with plots in Question 3 and conclusion: 2 points

(d) Summary statistics output: 3 points

Comparison with the values predicted by theory: 4 points (2 points for mean and 2 points for sd) Sample mean which is more accurate estimate of the population mean: 1 point