SSEES0083 Quantitative Methods 2019/20
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MAIN EXAMINATION 2019/20
SSEES0083
Quantitative Methods
PART A: Please answer FOUR questions (Total 48 points)
Exercise 1 ( 12 points)
Suppose we have a random sample of size n = 100 from a certain population. We find that sample mean X=80, sample standard deviation s=30. The probability that population mean u falls between two points a and b is 90%. Point a and b is symmetric to their central point.
a. What is the range between point a and b known as? (2 points)
b. If the population is normally distributed, select an appropriate value from table provided below and calculate values of a and b. (5 points)
| Critical values | from standard normal distribution |
| Critical values | from student t distribution (d.f.=99) |
||
a = 0. 1 |
1.2815 |
a = 0. 1 |
1.2902 |
a = 0.05 |
1.6448 |
a = 0.05 |
1.6604 |
c. If the shape of the population distribution is unknown, what is the expression to define the value of a and b. (5 points)
Exercise 2 ( 12 points)
You have been asked to determine if two different production processes have different mean numbers of units produced per hour. Process 1 has a mean defined as u$ and process 2 has a mean defined as u2 . Based on technical details, the lead engineer believes that the production performance of process 1 is at least as good as process 2’s performance, that is: process 1 produces either the same number of units or more units than the process 2, never less. You are asked to help verify this verdict by using a random sample of 25 paired observations. We obtain sample means 56 for process 1 and 50 for process 2. The standard deviation of the difference between two sample means is 25. Set significance level at 5%.
a. Formulate H' and H$ in this hypothesis testing. (2 points)
b. Point out the rejection region by using either critical value or significance level, illustrating it also graphically. (2 points)
c. Calculate the test statistics, referring to the data above. Using test statistics approach, explain your decision in this hypothesis testing and provide a conclusion (4 points)
d. The following probabilities are calculated by Stata functions. Using P- value approach, explain your decision in this hypothesis testing and provide a conclusion (4 points)
|
d.f. = 24 |
d.f. = 23 |
Lower tail probability |
0.8790 |
0.8788 |
Upper tail probability |
0.1209 |
0.1212 |
Exercise 3 ( 12 points)
In the week before final exams, the time spent on studying by students follows a normal distribution with a standard deviation of 8 hours. Now we take a random sample of 4 students to estimate the mean study hours for the population of all students.
There are two functions available in Stata to help calculate answers to question (a) and (b):
normal(x) |
calculate the cumulative probability of standard normal distribution for score x |
ttail(df, t) |
calculate the upper tail cumulative student’s t distribution with degree of freedom df and value t |
a. What is the probability that the sample mean exceeds the population mean by more than 2 hours? Derive the process of solution, select an appropriate function provided above and write down the full Stata command leading to the final answer. You do not need to perform actual calculations to obtain the results. (4 points)
b. What is the probability that the sample mean differs from the population mean by more than 4 hours? Derive the process of solution, select an appropriate function provided above, and write down the full Stata command leading to the final answer. You do not need to perform actual calculations to obtain the results. (4 points)
Now, suppose that a second (independent) random sample of 30 students was taken.
c. Without doing calculations, given the larger sample size, state whether the probabilities in question (b) would be larger, smaller or the same as in the first sample with 4 students. (4 points)
Exercise 4 ( 12 points)
You are in a project studying the impact of technology innovation on fuel economy. Specifically, you are studying the performance of vehicle, which is measured by variable milpgal as the miles per gallon the vehicle can travel. A data set is obtained from collecting information through 4 companies. Variable company contains numerical values 1, 2, 3, 4 for data collected from company 1, company 2, company 3 and company 4. Company 1 and 4 are leading companies in the industry and they have adopted an innovative technology before the period of data collection. Please answer the following questions:
a. Explain the meaning of the following command and state what change can be expected to the data set after executing the command: (4 points)
gen inno=(company==1 | company==4)
Next, we plot a Box-Whisker graph to study the distribution of subsections data with the command as follows:
graph hbox milpgal, over(inno)
b. Each box-whisker graph contains 4 sections: 2 ‘whiskers’ sections at two ends and 2 box sections in the middle. What do the 4 sections refer to in a box-whisker plot? What quantity does the white line indicate in the middle section as shown above? (4 points)
We next turn to study the numerical facts of subsections data via a statistics table as follows:
tabstat milpgal, by(inno) s (n mean sd sk)
c. Which subsection distribution, grouped by value in variable inno, is closer to the normal distribution? Give reason for your answer by commenting the numerical results shown in the statistics table above, as well as the graphical results shown in the box-whisker graph in question (b) (4 points)
PART B: Answer TWO out of THREE questions (Total 36 points)
Exercise 5 ( 18 points)
Suppose that you’re in charge of marketing airline seats for a major carrier. You’re focusing on the flight ticket overselling problem. Four days before the flight date you have 16 seats remaining on the plane. You know from past experience that 80% of people that purchase tickets in this time period will actually show up for the flight. You are estimating the potential losses when selling 18 tickets. Set the random variable X as the number of occupied seats on the flight day.
To help this analysis, a cumulative probability table produced by Stata function binomial(n,k,p) is provided as follows, with p=0.8 and different values of number of trials n and number of success k:
n=16 |
|
n=18 |
|
x |
|
x |
|
0 |
6.554人10一12 |
0 |
2.621人10一13 |
1 |
4.260人10一10 |
1 |
1.914人10一11 |
2 |
1.301人10一8 |
2 |
6.609人10一10 |
3 |
2.479人10一7 |
3 |
1.435人10一8 |
4 |
3.301人10一6 |
4 |
2.197人10一7 |
… |
|
… |
|
11 |
0.202 |
11 |
0.051 |
12 |
0.402 |
12 |
0.133 |
13 |
0.648 |
13 |
0.284 |
14 |
0.859 |
14 |
0.499 |
15 |
0.972 |
15 |
0.729 |
16 |
1 |
16 |
0.901 |
. |
. |
17 |
0.982 |
. |
. |
18 |
1 |
Please answer the following questions:
a. What distribution does the random variable X comply with? Write down the formula to help calculate the probability of any number of seats X0 occupied on the flight day with clear explanation on the parameters involved in the formula. You do not need to calculate the actual numerical results of the final answer produced from the formula (4 points)
b. What is the probability to have 1 empty seat? 2 empty seats? 3 empty seats and 4 empty seats on the flight day? Calculate probabilities respectively for each scenario with information provided in the table above (8 points)
c. Graph the probability density function of the 4 scenarios based on your calculations in question (b). State the name of the variables which denote y and x axis in your graph. (4 points)
d. You learned that the airline company on average suffers the following losses in values for this route in these 4 scenarios:
|
1 empty seat |
2 empty seats |
3 empty seats |
4 empty seats |
Loss in 1 £ |
400 |
780 |
1100 |
1300 |
What is the expected value of loss caused by having number of empty seats ranging from 1 to 4? (2 points)
Exercise 6 ( 18 points)
You are joining a project on factor study in fuel economy. The project is making good progress, where several effective factors have already been identified by using a cross-sectional data set. You are looking for other potential factors to explain the differences in miles per gallon a vehicle can travel. Variable milpgal measures the miles per gallon that a vehicle can travel. Variable price measures the selling price of vehicle. You’re investigating whether on average a more expensive car can travel more miles per gallon. The data set records information collected from 4 different companies.
Please answer the following questions involved in the analysis process. You start the investigation on the impact of variable price by using the following Stata commands. The output is also included:
a. Explain the meaning of command gen Hprice=(price>r(mean)) as shown above and what is the range of values contained in variable Hprice. (4 points)
Then you perform a hypothesis testing on the subsection data created from question (a). It is assumed that the populations for both subsections are normally distributed.
b. What is the H' and H$ in the test implemented by command ttest milpgale, by(Hprice) as shown above? (3 points)
c. 3 cases of hypothesis testing results are produced, as shown in the last 2 lines. Set significance level to 5%. What conclusion do you arrive at with referring to each of three cases of the test? Combining results from 3 sets of tests altogether, what is your conclusion? (8 points)
d. Based on your conclusion in question (c), will you advise to include variable Hprice into the model to help explain the change in variable
milpgal? Give reason for your answer. (3 points)
2022-08-11