ECON 5079 Econometrics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Group project instructions
ECON 5079
Econometrics
Experiments with sparse regression models
We will be generating data from regression models in order to understand properties of various estimators and procedures. Our basic framework requires to generate p predictors in matrix x and a target variable y as follows:
xi ∼ Np (0, S) (1) yi = β1 xi1 + ... + βp xip + εi , εi ∼ N(0,σ2 ) (2)
for i = 1, ...,n, where S{jk} = ρ |j−k| for some correlation level −1 ≤ ρ ≤ 1 and for elements j,k ∈ {1, ...,p}.
1. Use the code Monte Carlo bias.m and do various experiments in order to demonstrate the importance of omitted variable bias on econometric estimates (see Appendix A for more details and guidance).
2. Write code that explores the opposite issue, i.e. what happens if we generate from a regression with three significant predictors but we estimate a regression with p − 3 (p > 3) additional predictors that are irrelevant? Which is more hurtful for regression, omitting an important predictor or including an irrelevant one? (Hint: make sure you are thorough enough and explore the effect of various choices n,p,σ 2 ,ρ)
3. Variable selection for small p Generate 10 predictors in x and perform an information-theoretic model averaging approach similar to Pesaran and Timmerman (1995, Journal of Finance) and Kapetanios, Labhard and Price (2008, Journal of Business & Economic Statistics). Write a short MATLAB code that scans through all 210 possible model specifications, estimates each one using OLS, and calculates some measure of fit of your preference (e.g. BIC, AIC, adjusted R2 etc). Find the model with the highest probability of being the“best” model. Notes on the procedure are in Appendix B.
4. Variable selection for large p Use the lasso and elastic net to perform high- dimensional variable selection using 5-fold cross validation. Set p large and explore cases where p ≫ n. Alongside the other choices (σ2 ,ρ) explain in which cases the lasso/elastic net choose the correct variables.
HEALTH WARNINGS:
❼ I won’t accept a sloppy copy-paste of a million tables without structure, motivation
and scientific structure. Your main task is to build a story and explain what works and what doesn’t, in a structured and thorough way. Your report should be scientific and evidence based, and not opinion or intuition-based like a newspaper article or a blog piece.
❼ You should submit all your code in clear and reproducible form. I won’t accept use of build-in functions (other than the functions for lasso/elastic net).
❼ You can use MATLAB, Python or R. I can read other languages, but it will be harder for me to run your code and replicate things, so you are advised NOT to work in C++, Java, Stata etc.
References
[1] Kapetanios, G., Labhard, V. and Price, S. (2008) Forecasting Using Bayesian and Information-Theoretic Model Averaging, Journal of Business & Economic Statistics, 26(1), 33-41.
[2] Pesaran, M.H. and Timmermann, A. (1995), Predictability of Stock Returns: Robustness and Economic Significance. The Journal of Finance, 50, 1201-1228.
[3] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
Appendices
A Assessment of Omitted Variable Bias
Assume that the true DGP for our data y is
yi = βx xi + βz zi + εt
but we instead estimate
yi = bx xi + εi
(A.1)
(A.2)
How does omitting zi from the model
affect the least-squares (LS) estimate of bx ? Is x a reliable estimate of the true βx ?
Using matrix notation (y = xβx + zβz + ε), it follows that
x = (x\ x)−1 x\y
= (x\x)−1 x\ (xβx + zβz + ε)
= βx + βz (x\x)−1 x\ z + (x\x)−1 x\ ε
Thus, x is unbiased estimate of βx (i.e. E(x ) = βx ) if
1. E(xi εi ) = 0;
2. βz = 0 or E(xi zi ) = 0.
(A.3) (A.4) (A.5)
In words,
1. The regressor xi and error εi are uncorrelated (part of the usual assumptions in the linear regression model);
2. zi is not relevant for yt , or xi and zi are uncorrelated.
In practice (and especially for economic data) xi and zi will be highly correlated, meaning that omitted variable bias (OVB) can become a serious threat for regression analysis (especially when we omit many z variables from our regression). Your task is to use the provided code to illustrate as accurately as you can how serious this can be in different scenarios.
The code MONTE CARLO bias.m does two simple things:
1. Generate data y from a regression model with p, possibly correlated, predictors X . The coefficients β and σ 2 are known to us (i.e. we select their values). That way, we generate a finite sample of n values from y,X but we know what process generated these data, which will allow us to assess how close to the “truth” (i.e. the values of β and σ 2 we selected) various estimators are.
2. Using the generated data y and X it solves a simple OLS estimation problem providing two estimators: one where all p predictors are used (called beta OLS in the code), and one where we only use the first predictor in X as an example of committing an omitted variable bias (it is the vector beta OLS omitbias in the code).
First play around with the code to get a feeling of what it does, and what results it produces. Next try to devise different scenarios of omitted variable bias. Try to see what happens for different values of the DGP parameters (see code): n,rho,p,sigma2,beta. When is the bias substantial and when is the bias less of a concern?
B Information Theoretic Model Selection and Averaging
Pesaran and Timmermann (1995, Journal of Finance) consider the following stock return prediction model
ρt = βXt + εt , (B.1)
where ρt are stock returns in excess of the risk-free rate, and Xt are the following available predictors
Xt = [YSPt−1,EPt−1,I1t−1,I1t−2,I12t−1,I12t−2 , ... Πt−2 , ∆IPt−2 , ∆Mt−2]
❼ Namely: dividend yield, earnings-price ratio, 1-month T-bill rate, 12-month T-bond
rate, inflation, industrial production, M0
❼ Some variables only appear with a second lag (e.g. ∆IPt−2) because of publication
lags from the relevant statistical offices
❼ Consider all possible model combinations based on these 9 variables
❼ A variable is either included (1) or excluded (0) from the regression ⇒ leading to 29 = 512 possible models
❼ Denote model Mi , i = 1, ..., 512 as
ρt = βXt(i) + εt ,
where Xt(i) has the predictors of model i
(B.2)
2
❼ Estimate all models and then store BICi , Ri
❼ Pesaran and Timmermann actually use economic criteria (Sharpe ratio) to select
the optimal model
❼ With modern PCs one can easily enumerate deterministically and estimate all
possible model combinations when facing 30-40 predictors
❼ With more than 40 predictors it is computationally infeasible to estimate all possible regression models1, but stochastic algorithms exist that find the most probable models – we will see such algorithms during the lectures on Bayesian inference
❼ However, when forecasting stock returns, or exchange rates (or inflation, as we will
see next), predictors are unstable - some variables forecast well some periods, others not
❼ There is a way to reduce the risk associated with selecting a single model
❼ This procedure is called model averaging
Consider the case of two variables, i.e. 22 = 4 models
ρt = β0 + εt , (B.3)
ρt = β0 + β1X1,t + εt , (B.4)
ρt = β0 + β2X2,t + εt , (B.5)
ρt = β0 + β1X1,t + β2X2,t + εt , (B.6)
and their associated 4 BIC values: BIC1 ,BIC2 ,BIC3 ,BIC4 . Kapetanios, Labhard
and Price (2008, Forecasting using Bayesian and information theoretic model averaging, Journal of Business and Economic Statistics) show that we can convert these into model probabilities:
exp(−0.5(BICi − min(BIC))) πMi =
(B.7)
where notice that (for numerical stability, i.e. in order to avoid overflow/underflow) we subtract from each BIC value the minimum value attained by the BIC over all models. We can now use these model probabilities to construct probabilities for each variable of interest, i.e. variable-specific probabilities :
❼ X1,t has probability equal to ωX1 = πM2 + πM4
❼ X2,t has probability equal to ωX2 = πM3 + πM4
Such probabilities are also called probabilities of inclusion of each variable – not to be confused with p-values from independent t-tests.
The ideas above generalize to models with p predictors as long as the number of predictors is less than 30-40. For example, with p = 50 we have 250 models which is a vast number. Even if it takes you 0.001 seconds to estimate a single model, you would need 36000 years to estimate all 250 models!
2022-12-01