闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Nonlinear econometrics for ﬁnance

HOMEWORK 1

(Review of linear econometrics and review of methods)

Problem 1 (Linear econometrics). (60 points) Household ﬁnance is a growing ﬁeld in ﬁnance. Rising health costs are not just impacting house- holds’ ﬁnances, they are aﬀecting an array of decisions, including the decision to change (or retire from) an occupation which provides favorable health in- surance subsidies. For a cross section of individuals, the ﬁle “insurance.csv” provides the following information:

● age: age of primary beneﬁciary of health insurance

● sex: gender of primary beneﬁciary of health insurance

● bmi : this is a measure of a person’s weight relative to height. It is deﬁned as bmi = kg/m2 , where kg is the person’s weight and m2 is the person’s height measured in squared meters. A bmi between 18.5 and

24.9 is considered healthy. More would be considered “overweight” .

● children: number of children covered by health insurance

● smoker: whether the primary beneﬁciary is a smoker or not

● region: the primary beneﬁciary’s residential area in the US (northeast, southeast, northwest, southwest)

● charges: medical costs billed to health insurance.

Given this information, you need to perform linear regression in Python to understand the drivers of medical costs.

(1) (3 points) Generate an histogram of the medical costs and compute de- scriptive statistics (mean, median, standard deviation, minimum, max- imum). Is the distribution symmetric? Why or why not, in your view?

(2) (3 points) Take a logarithmic transformation of the medical costs. Plot the histogram of the log-costs. What do you notice now? How would you explain the change?

Begin by excluding all categorical variables (sex, smoker and region).

(3) (4 points) Run a regression of the log-costs on the non-categorical ex- planatory variables:

log(costi ) = θ0 + θ1 agei + θ2 bmii + θ3 childreni + εi ,

where εi is an error term.

(4) (3 points) Give an economic interpretation of the estimated coeﬃcients in the regression above. What does the model say about the determi- nants of medical costs?

(5) (4 points) We want to test whether the coeﬃcient θ2 for bmi is statis- tically signiﬁcant. Test the hypothesis using the relevant test statistic. Does bmi have more or less explanatory power than age?

(6) (3 points) We want to test whether the coeﬃcient θ2 for bmi is statis- tically signiﬁcant. Test the hypothesis using the relevant p-value.

(7) (5 points) Test the single linear restriction θ 1 = 3θ2 using the relevant test statistic.

(8) (3 points) Test the single linear restriction θ 1 = 3θ2 using the relevant p-value.

(9) (5 points) Test the multiple linear restriction θ 1 = 0.04 and θ2 = 0 using the relevant test statistic.

(10) (3 points) Test the multiple linear restriction θ 1 = 0.04 and θ2 = 0 using the relevant p-value.

(11) (4 points) Using the estimated model, predict medical costs for a 50 year-old person with bmi = 36 and 4 children. Is the prediction lower or higher than the mean of the distribution of the medical costs? (Recall that the regression gives you a prediction for the log of the medical costs (say, log(y)) not for the medical costs (say, y) . Hence, after you ﬁnd the prediction for the log of the medical costs, you need to make a transformation to ﬁnd a prediction for the medical costs themselves. Hint: if log(y) is normal, y is lognormal. What is E(y) for a log normal random variable? )

Now, take the categorical variables into account using dummy variables (https://en.wikipedia.org/wiki/Dummy_variable_(statistics)).

(12) (3 points) How much more (or less) do males spend relative to females (controlling for all other variables)?

(13) (3 points) How much more (or less) do smokers spend relative to non smokers (controlling for all other variables)?

(14) (3 points) In which region are medical costs higher (controlling for all other variables)?

(15) (3 points) What is the diﬀerence in medical costs between the northeast and the southwest (controlling for all other variables)?

(16) (4 points) Are the coeﬃcients associated with the dummies individually statistically signiﬁcant?

(17) (4 points) Using your model, predict medical costs for a 50 year-old male smoker with bmi = 36 who lives in the southwest and has 4 children.

Problem 2 (Review of methods). (40 points) Assume an iid sample {x1 , x2 , ..., xT ( from some distribution with expected value µ and variance σ 2 . A natural estimator for the true variance (i.e., σ 2 ) of the random variable which generates the data is the sample variance, namely s北(2) = (xt · X)2 , where X deﬁnes the sample mean, i.e., X = xt .

First, let us focus on the ﬁnite-T (or ﬁnite-sample) properties of s北(2):

(1) (6 points) Show that the sample variance s北(2) is biased for the true variance σ 2 .

(2) (3 points) How would you correct the bias?

(3) (3 points) What is the bias of the infeasible variance estimator s北(2) ,inf = (xt · µ)2 . Why am I calling this estimator infeasible?

Now, let us turn to the large-T (or inﬁnite-sample or asymptotic) properties of s北(2) . Write the following:

s北(2) = (xt · X)2

t=1

= ((xt · µ) · (X · µ))2

t=1

T T

= 1 (xt · µ)2 · 2(X · µ) 1 (xt · µ) + (X · µ)2 (1)

石 `y - 石 `y - (c)

(a) (b)

Now, subtract σ 2 from the left-hand side and from the right-hand side of Eq. (1) and standardize by ^T to obtain:

^T (s北(2) · σ 2 ) = ((xt · µ)2 · σ 2 ) · 2(X · µ) 1 T (xt · µ) +^T (X · µ)2

石 - t=1

(a* ) (b* ) (2)

(4) (6 points) Show that s北(2) is consistent for σ 2 by applying the LLN to (a), (b) and (c) in Eq (1).

(5) (6 points) Show that ^T (s北(2) · σ 2 ) is asymptotically normal by ap- plying the LLN, the CLT and Slutsky’s theorem to (a* ), (b* ) and (c* ) in Eq. (2).

Notice that consistency is a statement about sample averages, like s北(2), con- verging (as T 二 o) to expected values. Asymptotic normality is a statement about demeaned (by σ 2 , in our example) and standardized (by ^T, in our example) sample averages, like ^T (s北(2) · σ 2 ), converging (as T 二 o) to a mean-zero normal distribution.

(6) (8 points) Use my sample Python codes from Lecture 1 to write a code which shows consistency of s北(2) . You should draw your observations from a random variable which is neither exponential nor normal.

(7) (8 points) Use my sample Python codes from Lecture 1 to write a code which shows asymptotic normality of ^T (s北(2) · σ 2 ). You should draw your observations from a random variable which is neither exponential nor normal.

2023-02-07

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言