闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STA 108 Spring 2023

Project I

Due by 11:59 PM Friday, May 5th onto Gradescope

Read the following instructions carefully:

● You may work by yourself, or in a team of two people total.

● You are not allowed to discuss the questions with anyone other than the instructor or TA.

● Do not share answers, or speciﬁc values for calculations, particularly on Piazza.

● Any outside help beyond that from the instructor or TA is considered plagiarism. This includ- ing asking a tutor, your classmates (no comparing answers), posting the questions to homework help sites, etc.

● You are allowed to use or modify your previous code, or the teaching staﬀ ’s codes that are posted online.

● The maximum length of the report is 2 pages excluding title, header, tables, plots, and R appendix. Tables, plots, and R appendix should be attached in the end of your report.

● Formatting will be a signiﬁcant portion of your grade for this project (take-home exam). There should be an appendix of code, and no code, or R results (results that are directly copied and pasted from R with no additional formatting) in the body of the report.

● Your report should be in full paragraph form. You are allowed to have tables, and/or use R Markdown, but it should have clearly labeled sections. You can also use Word or Google docs (or latex) if you are more comfortable with those.

● If you work with a partner, one of you submit your report onto Gradescope with indicating your partner on the Gradescope. Your project will be graded as a group eﬀort, in this case. This means that you are responsible for your own work and your partner’s work. I will not assign two diﬀerent grades to one project.

Each group will pick one of the following datasets to explore the Normal Regression Model.

The Dataset

1. CDI Data

The dataset we will be working with is: CDI .csv. It has the following columns:

Column 1: income Total personal income (dollars)

Column 2: degree Percent of adult population (persons 25 years old or older )with bachelor’s degree.

Column 3: region Geographical region, with categories NE (North East), NC (North Central), S (South), W (West).

For each geographic region, regress per capita income (Y) against the percentage of individuals in a county having at least a bachelor’s degree (Ⅹ). Assume that the Simple Normal Linear Regression Model is appropriate for each region.

2. SENIC Data

The dataset we will be working with is: SENIC .csv. It has the following columns:

Column 1: length Average length of stay all patients in hospital (in days)

Column 2: infection Average estimated probability of acquiring infection in hospital (in percent) Column 3: facility Percent of 35 potential facilities and services that are provided by the hospital

Column 4: Xray Ratio of number of X-rays performed to number of patients without signs or symptoms of pneu-

monia, times 100.

The average length of stay in a hospital (Y) is anticipated to be related to infection risk, available facilities and services, and routine chest X-ray ratio. Assume that the Simple Noraml Linear Model is appropriate for each of the three predictor variables.

The Report Format

The Goal: The goal is to compare three or four simple linear models and report the statistically best model. You should write up a full, paragraph form report on your ﬁndings, which should include the following sections:

I: Introduction: A small introduction about the goal, what data you are using, and what model you are using.

II: Summary: This should include summary plots of describing the relationship between your explanatory and

response variable, and any numerical summaries you ﬁnd interesting.

III: Data Preparation. This section should include ﬁnding and removing any outliers in preparation for a model.

How many outliers you found (if any) should be noted, and the rows with those outliers in them should be shown.

IV: Model ﬁtting. Regress your response variable each explanatory variables. Plot each estimated regression line and data on separate graphs. Estimate their σ 2 s and compare the values among each model. State the best predictor/ region by using ∩2 criterion. And use the best model to do model diagnostics and interpretation.

V: Model Diagnostics: Perform diagnostics to see if the assumptions of Normal linear regression hold. If you do

not think they do, state this, but continue with the report.

VI: Interpretation: Interpret the coeﬃcients, any conﬁdence intervals for parameters and ∩2 that you calculated.

VII: Conclusion: One or two sentences on what variables you found were the most important/interesting to your

model thing, and state one limitation of the ﬁnal model.

Details

Your report should be the following format:

i. Typed.

ii. A title page including your name/s, the name of the class, and the name of your instructor (me). Or a header works.

iii. An appendix of your R code used to produce the results. Do not include in R code in the body of your report. For example, your project should be put together in the following order:

Cover Page (Title Page/Header)

Parts I-VII

Code appendix

2023-05-05

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios