Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STA 112 Project 2

Description

This project gives you the opportunity to apply what you’ve learned in this class to investigate a research question you’re personally interested in.  Project 2 focuses on best subset selection, a systematic model selection method that chooses the best set of predictors available in your data set for a given response variable.

To complete the project, you will

1. Identify a data set you can use to build a multiple linear regression model.

2. Perform best subset selection to choose a model that fits your data.

3. Write a technical report explaining your analysis and conclusions.

Choosing a Good Data Set

For this project, a good data set should:

1. Have a large number of variables (at least 8 potential explanatory variables, 9 or more total variables).

2. Have a mix of numerical and categorical predictors (should have at least 3 numerical predictors).

3. Allow you to come to interesting conclusions about the response variable.

Note:  In order for your project to be interesting, your data set should have a large number of variables. Choosing a data set with too few variables will make it difficult for you to complete the project in a satisfac- tory way.

Note 2: I am open to some deviation from the guidelines above if you have a compelling reason (an especially interesting research question, fewer variables but tricky analysis, etc.).

Deliverables

1. Data set selection:  Before you start working on your project, you will send me some information about the data set you plan to use, including:

(a) The name of the data set you plan to use.

(b) A link to the data set that I can use to access it (can submit a workspace image if easier).

(c) Whether or not you have successfully loaded the dataset into R. If not, describe the problems you’re having so we can troubleshoot.

You should have a data set selected by Friday, April 14 at the end of the day.

2. Progress update:  By April 21, you will submit an update about the progress you’ve made. There is no requirement for how much progress you should make by this date, but ideally you will have made some progress on your analysis.  The main purpose of the progress report is to keep me in the loop about your project so I can provide assistance if you need it.

3. Final  project  report:   At the end of the project, you will submit a technical report about your problem and how you solved it.  Details for what needs to be included in your project report will be posted to Canvas in a separate document, but roughly:

(a) A description of your data set

(b) A summary of the best subset selection process (a clear procedure for BSS will be made available

on Canvas).

(c) A discussion of your final model and any interesting conclusions that can be drawn from it.

Officially, your report is due by the last day of classes, April 26. However, you will not be penalized as long as you submit your final report by noon on Friday, May 5.

Caution:  Be aware that I cannot accept work after Friday, May 5.  This is a rule imposed by the University and cannot be changed.

Where to Find Data

Although you can use data from any source that you can ethically access and analyze, I recommend you start your search with the CORGIs data sets located here. You may also consider other publically available datasets, such as:

1. Data from the Center for Disease Control, https://data.cdc.gov/

2. Data from a variety of government agencies at https://data.gov/

Note:  For this project, you can’t use data sets from the Stat2Data package or from OpenIntro.