Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 231 Spring 2022 Assignment 1

You will submit answers/results from R in the Crowdmark pdf submission.  The final results         should be typed within the written responses and images inserted. Your assignment submission must be typed and submitted as a pdf. There are no exceptions. Any submitted answer which is not typed will not be marked and given a mark of zero. Further, all written answers must be in full sentences. Written answers which are not in full sentences will receive a deduction of 50% of the marks in that question part . Additionally, all plots should have titles and axes labelled    appropriately to receive full marks. The feedback you receive will be focused on your R output   and interpretations, not the detailed code itself. Thus, no R code should be included in your pdf solution file.

Your assignment solutions are to be submitted as a pdf file to two places:

•   To Crowdmark for marking

•   To the Assignment 1 LEARN dropbox to facilitate the running of your assignment through plagiarism detection software

You can upload your assignment as one document or individually for each problem. If you          upload one document then you must drag and drop the pages for each problem to the                appropriate question as indicated in Crowdmark. You can resubmit your assignment any             number of times before the deadline. Therefore, to ensure that there are no issues, we advise   you to upload well in advance of the due time. Assignments which are left as a single document and not uploaded to the appropriate places in Crowdmark will receive a 10% penalty.

A penalty of 1% per minute late to either Crowdmark or the LEARN dropbox is applied for late assignments.

Checklist to complete for this assignment:

Download the User Data All Rows 2022-04-27.csv” dataset containing data from the Space Wizard Plus Mobile Game from theLEARN pagewhich will be used throughout the                assignments in the course.

Work on the assignment throughout the week, answering the questions related to analysis of our dataset from the mobile game and those that do not.

For assistance with the dataset analysis questions, review the postedR Tutorial for  which gives an overview ofhelpful R code.

 

Upload the PDF of your assignment solutions to Crowdmark by the deadline.

 

Upload the PDF of your assignment solutions to the appropriate LEARN dropbox by the deadline.


Assignment 1 Intended Learning Outcomes

Below we have outlined the list of intended learning outcomes for this assignment. Keep them in mind as you solve the various questions, and we recommend you try and keep track of how these learning outcomes are achieved by each of the questions to follow.

•   Understand the basics of empirical studies such as approaches to data collection and types of variates

•   Understand how the inherent variability in random samples influences the sample measures and identify the relationship between sample size and variation from     expected values

•   Compute and interpret numerical measures of location, variability, and shape for a dataset

•   Compute and interpret graphical summaries of data such as histograms, empirical cdfs, box plots, run charts, scatterplots, and bar charts

•   Use numerical and graphical summaries to compare key similarities or differences between data

•   Use numerical and graphical summaries to assess the fit of a specified probability model for the data

[50 marks total]

 

1.   [8 marks] Go to the following the R shiny app:

https://shiny.math.uwaterloo.ca/sas/stat231/datasummaries/

*Note: To avoid crashing the R shiny server, please do not enter large samples sizes (i.e. > 10,000).

In the questions below, you will be asked to include screen shots of the histograms          corresponding to your randomly generated samples. Please only include an image of the histogram, not an entire print screen. You can use something like the “Snipping Tool” in Windows or crop the print screen to create the images.

Select the G(, ) distribution and set  to 2 and  to 3.

a)   [2 marks] Change the sample size to 50. Insert a screenshot of the corresponding histogram and provide the values for the sample mean, variance, skewness, and kurtosis.

b)   [2 marks] Change the sample size to 100. Insert a screenshot of the corresponding histogram and provide the values for the sample mean, variance, skewness, and    kurtosis.

c)   [2 marks] Change the sample size to 500. Insert a screenshot of the corresponding histogram and provide the values for the sample mean, variance, skewness, and    kurtosis.

d)   [2 marks] Discuss how the numerical and graphical summaries for each sample size  compare to their expected values. What do you notice as the sample size increases?

 

In Questions 2  5, you will start analyzing the Space Wizard Plus Mobile Game dataset that will be used throughout the course. Themetadata (data about data)/data dictionaryfor this dataset is posted on the LEARN page in the assignment module . This provides some useful background  information that you might have to reference for each assignment.

2.   [5 marks] Review the metadata and data dictionary for this dataset to answer the following questions.

a)   [1 mark] The study population is not explicitly defined in the metadata, provide an appropriate potential study population.

b)   [1 mark] Is this a sample survey, observational study, or experimental study? Justify your answer.

c)   [3 marks] Consider the following three variates in the dataset: skill_grade,             time_overworld, device_age. For each of these variates, explain with justification the variate type they were assigned in the data dictionary.

 

3.   [14 marks] Analyze the time_combat variate by answering the following questions.

a)   [3 marks] Provide the five number summary identifying what each number             represents, followed by the range, and interquartile range for time_combat in the dataset.

b)   [4 marks] Provide the values of the sample mean, sample standard deviation, and    sample skewness. Report to 3 decimal places. How do the sample mean and sample median compare? What does this say about the tails of the distribution and how      does that compare to an exponential distribution?

c)   [2 marks] Create and insert a plot of the relative frequency histogram with                  superimposed Exponential probability density function . Use the sample mean as the estimate for θ in your fitted model.

d)   [1 mark] For the Exponential(θ) distribution the mean and standard deviation are  both equal to θ . Therefore, the sample mean and sample standard deviation are    both estimates of θ based on the observed data. Are the sample mean and sample standard deviation close in value for your data?

e)   [4 marks] Using both the numerical and graphical summaries, describe how well the Exponential model fits these data. You should make at least four comparisons           between what you observed for your data set and what you would expect to             observe if the data were generated from an Exponential model.

 

4.   [13 marks] Analyze the pregame_skill variate by answering the following questions.

a)   [5 marks] Provide the sample mean, sample median, sample standard deviation,       sample skewness, and sample kurtosis for the pregame_skill variate in your dataset. (Round to 3 decimal places)

b)   [2 marks] Create and insert the plot of the relative frequency histogram with           superimposed Gaussian probability density function . Use the sample mean and      sample standard deviation as the estimates for mu and sigma in your fitted model.

c)   [2 marks] Create and insert the plot of the empirical cdf with superimposed Gaussian cumulative distribution function.

d)   [4 marks] How well does the Gaussian model fit these data? Use the graphical and numerical summaries to justify your answer. You should make at least four             comparisons between what you observed for the data set and what you would      expect to observe if the data were generated from a Gaussian model.

 

5.   [10 marks] Analyze the play_style by answering the following questions.

a)    [3 marks] What proportion of players have each of the five play styles (fire, grass, water, psychic, and ghost) in your dataset?

b)    [2 marks] Compare the time_space between the different play styles  by creating 5 side by side boxplots for the time spent travelling in space (hours) for each of fire,  grass, water, psychic, and ghost. Insert this plot in your pdf.

c)     [5 marks] Suppose you were only given this plot. Describe the information about the differences and similarities between the five groups of data that you can obtain just from this plot. Things you might wish to comment on include:

•   a comparison of the symmetry of the data sets

•   a comparison of the tail regions of the data sets

•   a comparison of the ranges (variability) of the data sets

•   a comparison of the medians (location) of the data sets

•   a comparison of the number of outliers of the data sets