Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

1)  The following (partial) summary output is from a SLR model fit to examine the relationship between CEO  compensation ($1000’s) and company annual profit ($1000000’s) for 30 companies randomly selected from the largest publicly traded companies in the United States by Forbes magazine.

Estimate Std. Error t value Pr(> |t |)

(Intercept) 827.5803   102.8876   8.044 9.28e-09

PROF          0.6155     0.3116   ****   ****

Residual standard error: ***** on ** degrees of freedom

Additional information: ∑ (xi )2  = 2649902, ∑ (yi )2  = 8206570,

a)   Interpret the value of the slope parameter estimate, , in the context of the study.


Every million dollar increase (decrease) in company profit is associated with an estimated mean increase (decrease) in CEO compensation of $615.50.

b)  The second company in this dataset had an annual profit of 54 million dollars.  If the residual associated with this company was 2  = 297.2 , what was the CEO’s compensation?

2  = 54

2  = 0  + 1 2

= 827.5803 + .6155(54) = 860.8173

2  = 2 2 2  = 2  + 2  = 860.8173 + 297.2 = 1158.0173

The CEO of the second company’s compensation is $1158017


c)   Calculate:

i)   The correlation coefficient, r. (You will need the additional information provided. Note the similarity between the expressions for r and ).

=

∑ ( ) ∑ ( ) = ∑ ( )2  = .6155(2649902) = 1631014.681

r = = = 0.3498


ii)  The residual standard error, . (Recall the distribution of ) () =

= ( )2 () = 2649902(0.3116) = 507.24

d)  Suppose prior to the study it was suggested that each additional million dollars in company profit is associated with a mean increase of $1000 in CEO compensation. Test this claim with an appropriate 95% confidence interval.

95% confidence interval for :

ˆ1 t28,.975 SE(ˆ1)

= .6155 2.048(.3116)

= .6155 .6382

= (−0.0227,1.2537)

Since this interval does include 1  = 1, it supports the claim that each additional million dollars in company profit is associated with a mean increase of $1000 in CEO compensation.

e)   Is there a relationship between CEO compensation and company profit? Carry out a hypothesis test to     answer this question. Be sure to report the null and alternative hypotheses, value of the test statistic, p-    value, and conclusion in the context of the study (note that you will only be able to provide an interval in which the p-value lies from the t table provided).

H0 : 1  = 0          Ha : 1 ≠ 0

= = = 1.975

p-value =2(28  > 1.975)

From t-table:

2(.025) < p-value < 2(.05)

.05 < p-value <. 10

Do not reject Ho . There is no significant relationship between CEO compensation and company profit.

2)  The following (partial) data and output are from a SLR model fit to gross monthly income ($) and loan amount ($) for 30 customers of a lending institution.

GROSSINC LOANAMT

1

2

3


717.00

2417.00

3333.33


500.00

1500.00

6547.00

Coefficients:


(Intercept)

GROSSINC


Estimate Std. Error

****** 1702.7547

1.5840   ******


t value

0.84

2.64


Pr(> |t |)

*****

*****


---

Residual standard error: 3605 on **** degrees of freedom

a)   Give the value of e2 , and interpret this value in the context of the study. e2 = y2 − ˆ2 = 1500 − (0.84(1702.7547) +1.584(2417)) =−$3758.84


b)  Is there a relationship between the loan amount and gross income? Answer this question by obtaining a 95% confidence interval for    .


ˆ1 t28,.975SE(ˆ1)

t = = 2.64 → SE(ˆ1) = = = 0.600

1.5840 2.048(0.600)

= 1.5840 1.2288

= (0.3552, 2.8128)

There is a significant positive relationship between loan amount and gross income


3)  Consider the simple linear regression model

Yi = 0 + 1xi + i i ~ N(0,2 ) ind. i = 1, 2, ..., n

a)  A (1-α)100% confidence interval for 0   is of the form:

ˆ0 tn2,1/2 ˆ 1 +

Based on the form of this interval, state the distribution of ˆ0 . Be specific. Justify your answer. No derivation is required.

b)  Recall that in least squares estimation, we lose a degree of freedom for each parameter in the model, as each estimate is associated with a constraint involving the residuals.

Give the two constraints on the residuals associated with least squares estimation for the SLR model. Show your work.

c)   Show that ei = 0 , where = ˆ0 + ˆ1xi .



4)   Consider a regression model that describes the relationship between age (yrs) and the explanatory variates size (m2), rooms (# of rooms), basement The following (partial) output is from a regression model fit to data from 75 U.S. companies to assess the relationship between the compensation paid to the CEO ($103 ) and the explanatory variates, EXPERience (years as CEO), and company PROFit ($106 ).

Call: lm(formula = COMP ~ EXPER + PROF)

Estimate Std. Error t value Pr(> |t |)

(Intercept) 615.9238   151.3835   4.069  0.00012

EXPER        24.6128    10.1354   *****  ******

PROF          1.1215     0.3263   3.437  0.00098

---

Residual standard error: 719.5 on 72 degrees of freedom

Multiple R-squared:  0.1838

a)   Give the distribution of ˆ2 .

~ N(β, (X X)2T 1) ˆ2  ~ N(2 ,2 (XTX)2(−)2(1))

b)  Give the value of (XT X)2(−)2(1)

Var(ˆ2) = 2 (XT X)2(−)2(1) SE(ˆ2) = ˆ

(XT X)2(−)2(1)  = = = 2.06107

c)  After  accounting  for  company profit,  is  there  a  significant relationship between  compensation  and experience? Answer this  question by  carrying  out  an  appropriate test.  Be  sure to  include the null hypothesis and alternative hypothesis, the value of the test statistic, p-value, and conclusion in the context of the study. Note that you will only be able to obtain an interval in which the p-value lies from the t table provided. Use the closest degrees of freedom provided in the table.

H0  : 1 = 0

t = 24.6128 /10. 1354 = 2.428

From t table (df = 60 closest available to 72):

2(.005) p-value = 2P(t70 2.428) 2(.01)

.01 p-value .02

We conclude that, after accounting for company profit, Experience is significantly positively related to compensation.

d)  What is the smallest value of the standard error of ˆ1  that would lead you to accept H0  : 1 = 0 ? (Use closest degrees of freedom provided in the table)

We would accept H0  : 1 = 0 if p-value >.05



pvalue 0.05 → t =       1         = 2.00

SE(ˆ1) = 12.3064

n

e)   Give the sum of squares of the residuals, ei(2)  .

i=1

ei(2) n




5)  Consider a regression model that describes the relationship between age (yrs) and the explanatory variates




Coefficients:

Estimate Std. Error t value Pr(> |t |)

(Intercept) -15.874084   4.921271  -3.226  0.00173

size          0.044075   0.018964   2.324  0.02227

rooms         1.646981   ********   *****  *******

basement      1.129761   1.199577   0.942  0.34871

lotsize       ********   ********   6.387 6.40e-09

garage      -11.716909   1.614950  -7.255 1.13e-10

Residual standard error: 5.523 on 94 degrees of freedom

Multiple R-squared: 0.7305,     Adjusted R-squared: 0.7161

F-statistic: *** on * and ** DF,  p-value: < 2.2e-16

a)   Interpret the size parameter estimate in the context of the study:

After accounting for the other variates, every additional m2 in house size is associated with an estimated mean increase in age of the house of .044075 years.

b)  The largest residual for this data is e61 = 12.8797 . Clearly and in plain language interpret this value in the context of the study.

e61 = 12.8797 = y61 ˆ61 The age of the 61st house is 12.8787 years older than is predicted by the model.

c) Which of the following is a plausible 95% confidence interval for the lotsize parameter?

i)   (0.0212, 0.0403)         ii)  (-0.0315, -0.0109)            iii) (-0.0023, 1.699)

iv)  6.387 ± 7.659             v)  All of them are plausible  vi) None of them are plausible

i) (0.0212, 0.0403)

The p-value associated with the lotsize parameter (6.40e-09) indicates a significant positive relationship (positive because the t value is positive). The only confidence interval that is consistent with this is i).

d)  A  95%  confidence  interval  for2 ,  the  parameter  associated  with  the rooms variate,  is  (0.368319, 2.925642).

i) Give SE(ˆ2) to three decimal places.



ˆ2 t94,975 SE(ˆ2 ) = (0.368319, 2.925642)

SE(ˆ2 ) = (2.925642 0.368319) / 2

1.98

= 0.646

ii)  Perform a hypothesis test of H0  : 2  = 0 vs. Ha : 2 0 . Be sure to include the value of the test statistic, p-value (using the t table provided with closest available degrees of freedom), and conclusion in the context of the study. Regardless of your answer in i) above, use SE(ˆ2) = 0.7

H0  : 2  =

Ha : 2 0

t =       2        =                 = 2.353

pvalue = 2P(t94 2.353) .02

Reject H0 . After accounting for the other variates, there is a significant positive relationship between the number of rooms and the age of a house.


6)  A company wishes to determine whether the amount of money (in $1000’s) spent on advertising (AdvExp) and the number of customer visits by sales staff (Visits) have an effect on weekly sales (in $1000’s).The    linear regression model

Y = Xβ + ε ε ~ N(0,σ2 I)

is fit to the weekly sales for a 5 week period (n = 5 is too small a sample size to yield any meaningful results from fitting a regression model, but we will ignore this for the purposes of this quiz).

Partial R output (rounded for convenience) and the fitted residuals are given below:

Estimate Std. Error t value   Pr(> |t |)


(Intercept)

-4.99

3.84

-1.30

0.32

Visits

0.40

0.07

5.7

*****

AdvExp

1.35

*****

*****

0.20

a)  What value is represented by the element corresponding to the third row, second column of X? A numerical answer is not required.

The third row refers to the third observation. The second column refers to the values of the first      explanatory variate (Visits). Thus, the element corresponding to the third row, second column of X refers to the number of customer visits in the third week.

b)  After accounting for advertising expenses, is the number of sales visits significantly related to sales? Answer this question by obtaining a 95% confidence interval for the appropriate parameter.

95% confidence interval for 1 :

ˆ1 tn(p+1),.975 SE(ˆ1)

= 0.40 4.303(0.07)

= 0.40 0.301

= (0.099, 0.701)



Since this interval does not contain the value of 1  = 0 , we conclude that, after accounting for advertising expenses, the number of sales visits is significantly (positively) related to sales.

c)  Answer the question in 2) by obtaining the (two-sided) p-value associated with H0  : 1  = 0 . (Note that from the t table, you will only be able to give a range of values in which the p-value lies)

H0  : 1  = 0

Ha : 1 0

ˆ

SE(ˆ1)

p-value = 2P(t2 5.7)

From t table:

P(t2 4.303) = 0.025

P(t2 6.965) = 0.01

2(.01) < p-value < 2(.025)

.02 < p-value < .05

Since p-value < 0.05, we reject H0  : 1  = 0 , and conclude that, after accounting for advertising expenses, there is a significant positive relationship between number of sales visits and sales.

d)  Calculate SE(ˆ2 ) (you will first need to find t, the value of the test statistic).

t = SE(ˆ2 ) =

p − value = 0.20 P(t2 t) = 0. 10

from t table:P(t2 1.886) = 0. 1

SE(ˆ2 ) = = 0.716

7)  Values of the response and explanatory variates fit to the linear regression model Yi = 0 + 1xi1 + 2xi2 + i i = 1, 2,3, 4 ,

as well as fitted values ˆi , are provided in the table below.



i

2

- 1

1

1


1i

9

7

7

8


2i

3

7

6

4


ˆi

1.7

-0.7

0.4


a)   Calculate the residual standard error (you do not need ˆ4 to answer the question) ˆ = = where ei = yi

e1  = 2 −1.7 = 0.3; e2  = − 1− (− .7) = −0.3; e3  = 1− 0.4 = 0.6

ei = 0 e4  = 0 (e1 + e2 + e3 ) = 0.6



ˆ = .32 + (− .3)2 + .62 + (− .6)2  = .95

b)  Give the X matrix

1   9    3

X = 1    7    7

1    7    6

1    8    4