Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ST 307: Final (38 pts)

In this assignment you will create a SAS program, save it as a .sas file, and upload that file to Moodle on the assignment link. You’ll also create plots which you will copy to a word or google doc and submit that file.

Notes:

• The .sas file submitted must meet the SAS File Submission Guidelines available in the Resources and Information section of the course.

•  If your file doesn’t meet these guidelines, we may take up to 50% off from your score.

• It is time to put what you’ve learned into practice!  You may not work with others on this assignment.  You cannot post to the discussion board nor post anywhere else to obtain help on this assignment. You should not look at outside sources other than the SAS help or our course materials.  This should be treated like a take-home exam. You may obtain help from your instructor (or another class’s instructor) if you are stuck. Remember that there is a one business day turn around on email. The instructors are the only people you are allowed to get help from or discuss this assignment with.

•  Failure to adhere to the above will result in an academic integrity violation and a 0 on   the assignment. Things such as: “I was running out of time and was stuck so I contacted my friend.” will not excuse you from these policies. Please work on this by yourself.

• No late work will be accepted. If you have a documented emergency that prevents you from completing a homework assignment, please contact your instructor and provide proof of the emergency.

• For all code that you write, it should be similar to code we wrote in class. For instance, if you are asked to create a histogram you should use PROC SGPLOT not something like PROC PLOT or PROC UNIVARIATE.

Data set:

The data set for this homework assignment deals with housing data from residential homes in Ames, Iowa.

There are many variables in this data set but you will have only a few of them (and you won’t have all the original observations). Information about the variables can be found here.  Our end goal is to relate the variables in your data set to the SalePrice variable.

Each student has their own data set. Your data set should be read in from the URL (see below).

Tasks

Write code corresponding to each question below. For example, don’t simply overwrite/modify the code used for question 2 in question 3. You can copy and paste the previous code if needed, but we need to see the code used to answer each question. Dont forget to include your header and to add comments prior to your SAS steps describing what you are doing!

1. (1 pt) Create a permanent library called yzhi4 using a LIBNAME statement.

2. (5 total pts) (3 pts) Write code to read in the yzhi4_house.csv data set using PROC IMPORT. You need to read your data into SAS directly from the URL https://www4.stat.ncsu.edu/~online/ST307/ Data/yzhi4_house.csv.

• (2 pts) Save the data as a data set in the permanent library created in question 1.

3. (9 pts) (2 pts) Use a DATA step to copy the data set read in question 2 into a temporary data set called myhouse with the following alterations:

a. (3 pts) Remove any observations where

• the Foundation variable takes the value Other

or

• the GarageArea variable takes on a value less than or equal to 321.6

b. (2 pts) Create a new variable with a name of your choosing that is the SalePrice variable divided by 100000.

c. (2 pts) The YearRemodAdd and LotShape variables are removed.

4. (4 pts) With your temporary data set, use a PROC step to create a two-way contingency table between the Exterior1st and HeatingQC variables.  In  a  comment  below your  PROC  step,  write  a sentence that describes what the upper left most value in the table means.

5. (6 pts) With your temporary data set, use a PROC step (one or two steps here is fine) to produce the following summary statistics about the SalePrice, TotRmsAbvGrd, and BsmtFinSF1 variables (and no other summary statistics) at every level of the GarageType variable.

• sample mean

•  sample standard deviation

•  sample 1st quartile

• sample 3rd quartile

6. (5 pts) With your temporary data set, use a PROC step to create a scatter plot using SalePrice on the y-axis and TotRmsAbvGrd on the x-axis. Color the points by the GarageType variable.

•  Copy and paste your plot into the word or google doc.

• In a comment below your PROC step, write a sentence that describes any pattern you see in the plot.

7. (8 pts) With your temporary data set, fit a multiple linear regression model using SalePrice as the response variable and TotRmsAbvGrd and BsmtFinSF1 as predictors. Do not include an interaction.

• Produce confidence intervals for the regression coefficients.

• Produce diagnostic plots that could be used to check assumptions. Copy and paste your diagnostic plots into the word or google doc.

In a comment below your PROC step, give the following:

 The fitted (or estimated) regression line.

 The 95% confidence interval for the slope corresponding to the TotRmsAbvGrd variable.

That’s it! Good work - save your .sas file and upload it to wolfware. We hope you enjoyed the course!