SEES0083 Quantitative Methods 2020-2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MA EXAMINATION 2020-2021 academic Year
Course: Quantitative Methods
Course Code: SEES0083
PART A: Please Answer FOUR out of FIVE Exercises
Exercise 1 (4 points)
The amount of time you have to wait at a dentist's office before you are called in is uniformly distributed between zero and 60 minutes.
1/60 |
1
60 |
(a) What is the probability that you have to wait more than 40 minutes? 1/3= (60-40)/60 (b) What is the probability that you have to wait between 20 and 40 minutes? 1/3= (40-20)/60
(c) What is the first, second and third quartile of this uniform distribution? 1st 0.25*60=15; 2nd 0.5*60=30; 3rd 0.75*60=45
Exercise 2 (4 points)
Consider the following probability distribution function of the random variable X assuming values 0 to 6 and with the associated probabilities, P(x), written below:
x |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
P(x) |
0.07 |
0.19 |
0.23 |
0.17 |
0.16 |
0.14 |
0.04 |
(a) What is P (X > 0)? 0.93 Summation of
0.19 |
0.23 |
0.17 |
0.16 |
0.14 |
0.04 |
(b) What is P (1 < X < 3)? 0.23=P(X=2)
(c) What is P (2 < X ≤ 4)? P(X=3) + P(X=4) =0.17+0. 16=0.33
(d) What is P (X ≥ 5)? P(X=5) + P(X=6) =0. 14+0.04=0. 18
(e) What is P (X < 6)? 0.96=1-P(X=6) =1-0.04
Exercise 3 (4 points)
In a recent survey of 30 teenagers, 62% of them indicated that they saw a movie within the past month. 75% of those teenagers who saw a movie also went out to dinner in the past month, while only 64% of the teenagers who had not seen a movie had been out to dinner in the past month. Define the random variables as follows:
X = 1 if teenager had been to movie; X = 0 otherwise
Y = 1 if teenager had been out to dinner; Y = 0 otherwise
(a) Find the joint probability function of X and Y.
(b) Find the conditional probability function of X, given Y = 1.
X\Y |
Out dinner yes 1 |
Out dinner no 0 |
marginal |
Movie yes 1 |
|
|
|
Movie no 0 |
|
|
|
marginal |
|
|
|
P(X|Y=1) =
Exercise 4 (4 points)
A random variable x has the following probability density function:
f(x)=0.25x
f(x)=1-0.25x
f(x)=0
for 0
for 2
otherwise.
a) Graph the probability density function for X.
0.5 |
x |
|
|
|
|
|
0 |
2 |
4 |
||
|
b) Show that the density function has the properties of the probability density function (hint: what would be the area underneath the density function? Which are the boundaries of this area?).
• for example, by measuring the area below the triangle geometrically, [(4-0) *0.5]/2=1
• by using the mathematical integral. ∞ f(x) dx = f(x) dx = 0.25x dx + (1 −
0.25x) dx = 0.25x2 {2 + [x − 0.25x2] {4 = 0.25∗22 − 0 + [4 − 0.25∗42 − 2 + 0.25∗22] = 0.5 + 0.5
• The function assumes value 0 for value of x=<0 and="" for="" values="" x="">=4 but the boundaries are
– infinity and +infinity. In other words, the function is well defined for value of x=<0 and="" for="" values="" x="">=4 too, not only for x between 0 and 4
c) Find the probability that X takes value between 1 and 3.
Students should compute
• Geometrically, (1*0.25 +1*0.25/2) *2=0.75
• integrals, f(x) dx = 0.25x dx + (1 − 0.25x) dx = {1(2) + [x − ] {2(3) = − + [3 − − 2 + ] = 0.375 + 0.375
the area between 1 and 3. Result 0.75.
Exercise 5 (4 points)
An administrator in charge of undergraduate education on a large campus wants to estimate the average number of books required by instructors. Using bookstore data, he drew a random sample of 26 courses for which he obtained a sample mean of 4 books and a sample Standard Deviation (not Standard Error) of 0.6 (the SD of the population is unknown so the students should use the t-student
not the z-score). Construct a 95% confidence interval to estimate (i.e. inference) the mean number of books assigned by instructors on campus.
The degrees of freedom are 25, the closest number they have on the table from the book is 25; the
column is the 0.025 (being (1-.95)/2)
Table result 2.060 (line 25 and 0.0025)
Confidence Interval=4 +/-2.060(0.6/√26) = 4 +/-2.060*0. 118= (3.76, 4.24)
Exercise 6 (4 points)
A local traffic enforcement department attempted to estimate the average rate of speed (μ) of vehicles along a strip of Fantasy Street. With hidden radar, the speed of a random selection of 250 vehicles was measured, which yielded a sample mean of 50 mph and a SD of 15 mph. (the SD of the population is unknown so the students should use the t-student not the z-score).
Estimate the standard error of the mean
SE (of the mean) =SD(sample)/√n, i.e. s/√n=15/√250=0.949
(a) Find the 90% Confidence Interval for the population mean
The degrees of freedom are 249, the closest number they have on the table from the book is infinity; the column is the 0.05 (being (1-.90)/2) (we can allow students to have some small
decimal inconsistencies here) 1.645
Confidence Interval=50 +/- 1.645*0.949= 50 +/- 1.561105= (48.438895, 51.561105)
(b) Find the 95% Confidence Interval for the population mean
The degrees of freedom are 249, the closest number they have on the table from the book is infinity; the column is the 0.025 (being (1-.95)/2) (we can allow students to have some small
decimal inconsistencies here) 1.960
Confidence Interval=50 +/- 1.960*0.949= 50 +/- 1.86004= (48.13996, 51.86004)
(c) Find the 98% Confidence Interval for the population mean
The degrees of freedom are 249, the closest number they have on the table from the book is infinity; the column is the 0.01 (being (1-.98)/2) (we can allow students to have some small
decimal inconsistencies here) 2.326
Confidence Interval=50 +/- 2.326*0.949= 50 +/- 2.207374= (47.792626,52.207374)
(d) Find the 99% Confidence Interval for the population mean
The degrees of freedom are 249, the closest number they have on the table from the book is infinity; the column is the 0.005 (being (1-.99)/2) (we can allow students to have some small
decimal inconsistencies here) 2.576
Confidence Interval=50 +/- 2.576*0.949= 50 +/- 2.444624= (47.555376, 52.444624)
Is there a lot of difference between c) and d)? Why? Explain briefly.
There is no much difference given the minimal change in the Confidence Interval in a distribution that is very “thin” in the tails, this translates into very small probability mass. Note that we are using a probability distribution with such characteristic, but this is NOT
necessarily always the case.
PART B: answer all questions (total 60 points).
1. Open STATA.
2. Open the dataset EXAM.dta (located into SEES0083 Moodle page)
3. Set a seed that is composed in this way and in this sequence
• Your own year of birth (4-digit)
• your own month of birth (2-digit)
• your own day of birth (2-digit)
Example: if the date of birth is 1972 May the 9th
Into STATA type: set seed 19720509
• generate a random number
Into STATA type: generate u = runiform ()
• drop the observations in which the newly created variable “u” (it should be in your list of variables) is strictly greater than or equal to 0.5
Into STATA type: drop if u>=0.5
Compliments! You have now created a unique dataset for your exercise (in other words no other student has the same database) in which you need to perform the tasks below.
Exercise 7 (20 points)
A researcher is trying to understand whether the average sales in Spain (ES) are comparable to the average sales in Italy (IT). For this purpose, she is using your firm level database. The variable for sales is called “r_OperRevTurnThEuro” (expressed in ‘000 of Euros) and number of employees is called “numberofemployees” . The first output analysed by the researcher is what will come out as the following command is generated (summary stats for the two variables only for Spain):
Into STATA type: sum r_OperRevTurnThEur numberofemployees if
country_ACRONYM=="ES"
• (1 point) Copy and paste the result into the exam sheet.
SPAIN
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
r_OperRevT~r | 13,669 7323.658 61903.99 .9090909 2992946
numberofem~s | 12,770 29.23759 143.8255 1 8331
The second output analysed by the researcher is what will come out as the following command is generated (summary stats for the two variables only for Italy):
Into STATA type: sum r_OperRevTurnThEur numberofemployees if
country_ACRONYM=="IT"
• (1 point) Copy and paste the result into the exam sheet.
ITALY
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
r_OperRevT~r | 17,046 8252.208 66708.15 .9165903 4054171
numberofem~s | 9,206 44.53237 290.8417 1 13978
• How would the researcher interpret the output of this tables?
• (1 point) Can the researcher conclude that there is a systematic (statistically significant) difference between the sales in Spain and Italy? No, she cannot conclude there is a systematic difference by just looking at the sheer means’ comparison. The comparison of the means does not take into account the distributions in the two samples.
•
• (1 point) Can the researcher conclude that there is a systematic (statistically significant) difference between the number of Employees in Spain and Italy? No, she cannot conclude there is a systematic difference by just looking at the sheer means’ comparison. The comparison of the means does not take into account the distributions in the two samples.
•
• The researcher runs a more appropriate test i.e. a two-sample t-test for the comparison of the mean of sales in Spain (ES) vs. Italy (IT). The researcher sets its alpha at 0.01 (1%)
Into STATA type: ttest r_OperRevTurnThEur,
by(country_ACRONYM)
• (1 point) Copy and paste the result into the exam sheet.
Two-sample t test with equal variances
-------------------- |
---------------------------------------------------------- |
|
|
Group | |
Mean Std. Err. Std. Dev. [95% Conf. Interval] |
---------+---------- |
---------------------------------------------------------- |
ES | 13,669 |
7323.658 529.4809 61903.99 6285.803 8361.513 |
|
|
IT | 17,046 |
8252.208 510.9373 66708.15 7250.718 9253.698 |
|
|
|
|
combined | 30,715 |
7838.978 368.6867 64614.91 7116.337 8561.619 |
---------+---------- |
---------------------------------------------------------- |
|
|
diff | |
-928.5497 741.8641 -2382.634 525.5345 |
|
|
|
|
diff = mean(ES) |
t = -1.2516 |
Ho: diff = 0 |
degrees of freedom = 30713 |
|
|
Ha: diff < 0 |
Ha: diff != 0 |
Pr(T < t) = 0.1054 |
Pr( |T | > |t |) = 0.2107 Pr(T > t) = 0.8946 |
• (3 points) What is the t-stat in this test? -1.2516
• (3 points) What are the three null hypotheses and the three alternative hypotheses?
•
•
diff <= 0.
2022-08-11