Worksheet 03 - Producing Data
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Worksheet 03 - Producing Data
##ENTER YOUR NAME HERE
Directions: Please upload a PDF to Gradescope that includes both your written responses and corresponding R code inputs/outputs (if requested) for each problem. In code blocks, you can uncomment (delete #) the non-NOTE lines and fill in the blanks (FILL IN) .
Problem 1 A health study is being conducted on a group of volunteers (487 smokers and 1513 non-smokers) to determine the effect of a new drug.
Problem 1 Part a) Estimate how many smokers are in a simple random sample of size 200.
NOTE: Enter your response as text or an image. See R Worksheet 02 for the code for inserting a picture.
Problem 1 Part b) Using R, determine 10 simple random samples, each of size 200, and record the number of smokers in each of the samples. Let’s agree to label the smokers with numbers 1 through 487 and the
nonsmokers with numbers 488 through 2000.
NOTE: The code in the R chunk below walks through the calculation in R, where the variable tmp_sample denotes one SRS of size 200 taken from the population, while the variable smoker_count stores the number of smokers in the k-th SRS.
set .seed(2022) #NOTE: This makes the experiments able to be duplicated in this worksheet . # population = seq(from = 1, to = 2000, by = 1) # create a sequence of numbers
# 1 to 2000 by 1 smoker_ count = rep(0, 10) # create vector to ultimately tally # the number of smokers in 10 SRS
# for (k in 1:10) #k is a loop variable that starts at 1, increments by 1, and # ends at 10 { #NOTE: { starts the loop and } ends the loop (THESE ARE VERY # IMPORTANT) NOTE: The call sample() should include the vector to be sampled # (population), along with the number to draw and whether or not to replace # each subject after it is drawn
# tmp_sample = sample( FILL IN , FILL IN , replace = FILL IN )
# NOTE: The call to sum() should be an expression in terms of tmp_sample . For # example, sum(tmp_sample < 3) counts the number of entries in tmp_sample that # are strictly < 3
# smoker_ count[k] = sum( FILL IN < FILL IN ) }
# NOTE: Now, let 's make a data frame to store the counts nicely, wrapping it in
# parenthesis so that the output automatically displays FILL IN the entries
# that will identify what to print in each column
# (smoker . df = data .frame(trial = FILL IN , nsmokers = FILL IN ))
Problem 1 Part c) Using R, determine the mean and the standard deviation of the number of smokers in the samples. How does the sample mean conform (i.e., is it far/close) to your calculation for the predicted population mean?
Hint: When describing how far/close the sample mean is to the population mean, it is a good idea to reference the standard deviation in your description.
NOTE: For the first part, enter the necessary code in the R chunk below. For the last part (the question), enter your response as text outside/below the R chunk.
# ENTER CODE HERE
Problem 1 Part d) The head researcher would like to choose 20 subjects for comprehensive medical imaging. Using R, perform a single stratified random sample having 10 smokers and 10 nonsmokers, and display the labels for the selected subjects in ascending order.
# NOTE : The command sample(x, size) chooses 'size ' items from x .
# NOTE: Make sure you sample 10 from the population of 487 smokers! tmp_smokers # = sample( FILL IN , FILL IN )
# NOTE: Make sure you sample 10 from the population of 1513 non-smokers! # tmp_nonsmokers = sample( FILL IN , FILL IN )
# NOTE: Let 's make a data frame to display our results!
# NOTE : You will use the sort() to display the labels for the selected subjects # in ascending order
# NOTE: FILL IN the sorted simulated data below (smoker . df = data .frame(smokers
# = FILL IN , nonsmokers = FILL IN ))
2023-02-02