闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ADA Spring 2023: Linear Regression 1

January 1, 2023

1 Introduction

For today’s lecture we will focus on data with the form, (xi ,yi ) with x a quantitative factor and y a quantitative factor. Our goal would be to know if the x inﬂuences the y. We restrict ourselves to models of the form

yi = µ + β1 (xi − )+ ✏i (1)

Where i indexes our data points. We also assume,

✏ ⇠ N(0, σ 2 ). (2)

with independence both between the index and the ✏ value. We consider our xi s to be ﬁxed. Only the yi are ’random’. Our goals will be

• Given our data how do we estimate µ , β1 and σ

• Given our estimated line, can we build a conﬁdence interval for a prediction?

• Is the e↵ect of X on Y statistically signiﬁcant, i.e. can we say with conﬁdence β1 0

• Can we build a conﬁdence interval for σ?

• How do we handle non normal residuals?

• (New) Alternative measures of correlation.

• (New) Robust Regression.

2 Fitting our line

See posted notes LR Review 1 and LR Review 2.

2.1 Gauss-Markov Theorem: Not on exam

In the statistical inference you learned about various types of estimators. We have chosen estimators that minimize the sum of the residuals squared. How do we justify this? Under the condition 2 we

have the Gauss-Markov Theorem.

Gauss (d. 1855) was able to show that under the conditions of the residuals being independent, normally distributed, each with the same variance, then our ordinary least squares estimator has the lowest sampling variance of all linear unbiased estimators.

Markov (b. 1856) was able to loosen these conditions. He showed that all that was needed was uncorrelated, with mean zero and ﬁnite homeoscedastic variance.

3 Conﬁdence Intervals and Linear Regression

To build a conﬁdence interval for our estimators and for our predictions we need to have some under- standing how our data is generated. For this we assume that both equations 1 and 2 are true where the only uncertainty is in the accuracy of our estimates and future random noise, ✏n+1

3.1 Results From Inference Class

In inference we learned that equation 1 and equation2 produce the following results.

=y¯ ⇠ N(0, σ 2 /n)

βˆ1 =r ⇠ N(β , )

2 = ei(2) ⇠ χn(2)− 1

We also learned of the independence of these 3 estimators.

Note: Since the true value of σ is unknown we replace it with the estimate se , which has a χ2 distribution. This causes our other two estimators to switch from z to t distribution.

3.2 The noise

To build any conﬁdence interval we must estimate σ from equation 2. We use our observed residuals.

3.3 The point

We assume our value of is exact and does not vary from experiment to experiment. The observed y¯ is assumed to be random and vary day to day. As this number is estimated from a given day’s data we would like to build a conﬁdence interval for it as well.

Without preforming linear regression our conﬁdence interval for the center of n observations would be

y¯ ± tn−1

With the linear regression we can replace sy with the smaller se and we get

y¯ ± tn−2

3.4 The slope

We will test the null hypothesis that

H0 : β1 = 0 Ai Ha : β1 0

Using our results from inference class this can be done with a simple t-test

^n − 1(b1 − 0)

sx /se

3.4.1 Interpretation

We will never have data that gives us a slope of exactly 0. Hence to say X has a linear e↵ect on Y we would like to have our slope statistically signiﬁcantly away from 0. The above test provides that. In the case we reject the null our line is statistically signiﬁcant. If we fail to reject we do not have enough evidence to say that X e↵ects Y.

3.5 Conﬁdence intervals for a prediction

With the variance of each of our estimators we are able to build conﬁdence intervals for predictions. We can ask our line to predict a new yn+1 for a given xn+1 or the mean E(yn+1 | xn+1). The latter will have a smaller conﬁdence interval.

yˆn+1 ± tn−2 sse(2) + + (xn+1 − )2

and

yˆn+1 ± tn−2 s + (xn+1 − )2

Both formulas have the satisfying form of our interval becoming wider as our prediction, xn+1, moves further from our data, .

3.6 Reverse Prediction : Not on Exam

Given yn+1 we can also predict n+1 . Under our assumptions the MLE would be

n+1 = + (yn+1 − y¯)

and we can ﬁnd

var(n+1) = h1+ + (xn+1 − )2]

4 Violations of assumptions

To detect departures from our assumptions we often:

• Look at scatter plot of the data

• compute R2

• plot the residuals versus x

• test for normality of our residuals

In the case our assumptions are violated the easiest corrective measure is a transformation of our data. We will revisit this later. Non parametric methods exist as well.

The second easiest corrective measure is to add other predictors in our model. Again we will revisit this another day.

4.1 Non-normality

The impact of non-normal residuals that our conﬁdence intervals will not be reliable. Robust regression methods can help here.

4.2 Correlated Errors

The impact of correlated residuals that our conﬁdence intervals will not be reliable. Time series methods can help here. The Durbin-Watson test can check test for correlation between sequential residuals.

ei = pei+1 + ⌫i

with ⌫ ⇠ N(0, σ v(2)) and then we test for H0 : p = 0. In the event of correlation the Cochrane-Orcutt procedure or generalized estimating equations can help.

4.3 Heteroskedasticity: Unequal Variances

The simple answer to this is the minimize the sum of our weighted residuals. That is

X wi ei(2)

i=1

The trick is ﬁnding the proper set of {wi } . If we knew var(ei ) = σi(2) then we would set

wi =

Hence we can assume some sort of volatility structure, σ = σ(xi ) either of closed form, observed from our residuals or from a 2nd predictor.

5 Alternative Measures of Correlation: Not on exam

To build our linear regression model we need the correlation of our data. The standard method of measuring correlation is Pearson’s. Just as we have alternative measures of center and spread we also have alternative measures of correlation.

5.1 Pearson’s product moment Correlation

In probability the deﬁnation of correlation is the following.

p = E h ]

−1 < p < 1

A common estimate for the above is Pearson’s method.

r = ⇣ ⌘⇣ ⌘

Under our model assumptions we can produce the below test statistic

T = rr

which has a tn−2 distribution under the null. The more general formula for testing if our correlation is p0 is below

The more general hypothesis H0 : p = p0 uses a Fisher’s Z-transformation.

1 1+ r

2 1 − r

Z = [Q(r) − Q(p)] ^(n − 3)

Which has a standard normal distribution.

Note: The problem with Pearson’s correlation coeﬃcient is outliers can push the value around. The most famous example is two data points with produce a correlation of 1. While 2 data points is a silly example, two clusters of data points, or one cluster and an outlier is not. To address this we introduce more ranked based metrics.

5.2 Spearman’s Rank Correlation Coeﬃcient

Here we ware less concerned with when X changes by 6 that y changes by β1 6 than if when x increases what y will also increase . To reﬂect this we replace our data points with the rank of our data points . Denoting di is the di↵erence of rank xi (relative to the x’s) and yi (relative to the y’s) we deﬁne Spearman’s rank corrlation coeﬃcent to be

rs = 1 −

If there are no ties in rank than rs is the Pearson’s correlation of the ranks, not of the raw data .

5.3 Kendall’s Tau

Kendall’s Tau is anther rank-based measure of association . Let Ri and Si be the ranks of Xi and Yi respectively. Then

T = sgn(Ri − Rj )(Si − Sj )

with

> ,

sgn(x) = <0 , −1,

if x > 0

if x = 0

X < 0

Kendall’s tends to be more robust than Spearman’s and is used when the sample size is smaller .

6 Robust Regression Methods: Not on exam

Minimizing least squares punishes the line severally for large residuals . If. due to whatever reason, we don’t want our line determined by outliers we can choose a more linear estimate of error . While this is hard to solve with calculus computers can do this work for us . There is more than one way to do deﬁne this .

6.1 Least Absolute Deviation

Here we ﬁnd the estimator that minimizes, w .r .t β

X | ✏i (β) |

i=1

Note:

• The sum of the residuals might not be 0

• Need numerical methods to solve

6.2 Least Median of Square Regressions

Previously we tried to avoid outliers by not squaring our residuals . Here we avoid it by minimizing the median as opposed to the mean .

median {✏i(2)(β),i = 1 , 2 , . . . ,n

• This method has a high breakdown point (maybe 50%)

• Need complex numerical methods to solve