闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

RClass1-LinearModelReview

2022

Review of some aspects of the Linear Model

Continuous vs. factor predictors

Load the libraries foreign (to be able to read in the data ﬁle) and arm, and read in the data kidiq. This is how it works on my computer:

library(foreign)

library(arm)

kidiq <- read.dta("kidiq.dta")

attach(kidiq)

head(kidiq)

##	kid_score	mom_hs	mom_iq	mom_work	mom_age
## 1	65	1	121.11753	4	27
## 2	98	1	89.36188	4	25
## 3	85	1	115.44316	4	27
## 4	83	1	99.44964	3	25
## 5	115	1	92.74571	4	27
## 6	98	0	107.90184	1	18

The response is kid_score, the child’s score in an IQ test, mom_hs (whether or not the mom has a highschool degree), mom_iq (mom’s IQ), mom_work (mom’s working pattern in the ﬁrst years of the child’s life, as described in Lecture 2a), and mom_age (mom’s age).

Next, ﬁt two linear models to these data:

fit1, which has mom_iq and mom_work as main eﬀects and no interaction. In this model, mom_work is treated as a factor predictor.

fit1 <- lm(kid_score~as.factor(mom_work)+mom_iq,x=TRUE)

summary(fit1)

## Call:

## lm(formula = kid_score ~ as.factor(mom_work) + mom_iq, x = TRUE)

## Residuals:

## Min 1Q ## -57.796 -12.103 ##

## Coefficients:

## (Intercept)

Median 3Q Max

1.892 12.019 50.582

Estimate Std. Error t value Pr(>|t|)

24.14226 6.14276 3.930 9.89e-05 ***

## as.factor(mom_work)2 3.97026 2.78980 1.423 0.1554

## as.factor(mom_work)3 6.60140 3.23986 2.038 0.0422 *

## as.factor(mom_work)4 3.06392 2.44682 1.252 0.2112

## mom_iq 0.59478 0.05942 10.009 < 2e-16 ***

## ---

## Signif. codes: 0 !*** ! 0.001 !** ! 0.01 !* ! 0.05 ! . ! 0.1 ! ! 1

## Residual standard error: 18.24 on 429 degrees of freedom

## Multiple R-squared: 0.2091, Adjusted R-squared: 0.2018

## F-statistic: 28.36 on 4 and 429 DF, p-value: < 2.2e-16

Explain how many parameters this model has and why: This model has ﬁve parameters, one for the intercept,

three for mom_work (because it is a factor with four levels), and one for mom_iq (because it is a continous predictor).

fit2, which has mom_iq and mom_work as main eﬀects and no interaction. However, mom_work is treated as a continuous predictor.

fit2 <- lm(kid_score~mom_work+mom_iq,x=TRUE)

summary(fit2)

## Call:

## lm(formula = kid_score ~ mom_work + mom_iq, x = TRUE)

## Residuals:

## Min 1Q Median 3Q Max

## -56.281 -12.137 1.976 12.167 48.781

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 24.54194 6.10417 4.021 6.85e-05 ***

## mom_work 0.63140 0.74823 0.844 0.399

## mom_iq 0.60427 0.05893 10.254 < 2e-16 ***

## ---

## Signif. codes: 0 !*** ! 0.001 !** ! 0.01 !* ! 0.05 ! . ! 0.1 ! ! 1

## Residual standard error: 18.27 on 431 degrees of freedom

## Multiple R-squared: 0.2023, Adjusted R-squared: 0.1986

## F-statistic: 54.64 on 2 and 431 DF, p-value: < 2.2e-16

Explain how many parameters this model has and why: This model has three parameters, one for the intercept,

one for mom_work (because it is now a continuous predictor), and one for mom_iq (because it is a continous predictor). Note that treating mom_work as a continuous predictor makes sense because its values are ordered, but it remains a question whether we want to treat the diﬀerences between each category of mom_work the same (which this model does).

For fit1 and fit2, print the ﬁrst 5 rows of the design matrix.

fit1$x[1:5,]

## (Intercept) as.factor(mom_work)2 as.factor(mom_work)3 as.factor(mom_work)4

## 1 1 0 0 1 ## 2 1 0 0 1 ## 3 1 0 0 1 ## 4 1 0 1 0 ## 5 1 0 0 1

## 1

## 2

## 3

## 4

## 5

mom_iq

121.11753 89.36188 115.44316 99.44964 92.74571

fit2$x[1:5,]

## (Intercept) mom_work mom_iq

## 1 1 4 121.11753

## 2 1 4 89.36188

## 3 1 4 115.44316

## 4 1 3 99.44964

## 5 1 4 92.74571

Comment on how they diﬀer: For fit1, the design matrix has ﬁve columns, and the factor predictor takes three of these. The values of these columns are either 0 or 1, for example, there is a 1 in the second column of the design matrix if mom_work ==2 and 0 otherwise, and so on. In contrast, the design matrix for fit2 has only three columns: the ﬁrst and last are the same as the ﬁrst and last column of the design matrix for fit1, but the second column is now diﬀerent. Here, the entry is the value of mom_work.

Plot the data along with the ﬁtted regression curve(s): For fit1, I get

plot(mom_iq,kid_score, xlab= "Mother IQ score" ,

ylab= "Child test score" ,pch=20 , xaxt="n" , yaxt="n" , type="n")

curve (coef(fit1)[1] + coef (fit1)[2] + (coef(fit1)[5])*x, add=TRUE , col="magenta") curve (coef(fit1)[1] + coef (fit1)[5]*x, col="red" , add=TRUE)

curve (coef(fit1)[1] + coef (fit1)[3] + (coef(fit1)[5])*x, add=TRUE , col="blue") curve (coef(fit1)[1] + coef (fit1)[4] + (coef(fit1)[5])*x, add=TRUE , col="black") points (mom_iq[mom_work==1], kid_score[mom_work==1], pch=20 ,col="red")

points (mom_iq[mom_work==2], kid_score[mom_work==2], pch=20 ,col="magenta") points (mom_iq[mom_work==3], kid_score[mom_work==3], pch=20 ,col="blue") points (mom_iq[mom_work==4], kid_score[mom_work==4], pch=20 ,col="black")

axis (1 , c (80 ,100 ,120 ,140))

axis (2 , c (20 ,60 ,100 ,140))

140

Mother IQ score

while for fit2, I get

plot(mom_iq,kid_score, xlab= "Mother IQ score" ,

ylab= "Child test score" ,pch=20 , xaxt="n" , yaxt="n" , type="n")

curve (coef(fit2)[1] + 2*coef (fit2)[2] + (coef(fit2)[3])*x, add=TRUE , col="magenta") curve (coef(fit2)[1] + coef (fit2)[2] + (coef(fit2)[3])*x, col="red" , add=TRUE) curve (coef(fit2)[1] + 3*coef (fit2)[2] + (coef(fit2)[3])*x, add=TRUE , col="blue") curve (coef(fit2)[1] + 4*coef (fit2)[2] + (coef(fit2)[3])*x, add=TRUE , col="black")

points (mom_iq[mom_work==1], kid_score[mom_work==1], pch=20 ,col="red") points (mom_iq[mom_work==2], kid_score[mom_work==2], pch=20 ,col="magenta") points (mom_iq[mom_work==3], kid_score[mom_work==3], pch=20 ,col="blue") points (mom_iq[mom_work==4], kid_score[mom_work==4], pch=20 ,col="black") axis (1 , c (80 ,100 ,120 ,140))