Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit



Assignment 1

Instructions

 

Assignment 1 consists of 2 questions. There are 100 marks in total. For each question, 5% of marks are reserved for evidence of best practice in R, marks for sub-questions therefore add up to 95% of the total marks for each question.

Answer all the questions and submit your answers to each question in ELE.

 

Write your R code for all of the questions in one R script and use comments to label each questionPlease include your candidate number in a comment at the top of your R scripts.

 

Make sure to include your R script as an upload to your submission. Please comment on each procedure to explain what you are doing (or intend to do). Submission is via ELE.

Read and answer each of the questions using your own code and words.  

Collaboration with others and plagiarism of other people's code is not permitted. Presenting someone else's code as your own work is misrepresentation, an academic conduct offence.

 

As you complete the assignment in ELE you can move between questions and return to complete them later. Once you select to "Finish and submit" a set of answers you can return to reattempt but previously entered answers may be removed. 

To avoid losing work during the submission process you should write your answers in a text file, such as a word file and save your R script. Copy and paste your answers and upload your R script to ELE in the relevant questions once you are happy with them.  Keep a copy of the files as a backup.

For this assignment, you will be working on a cross-country COVID-19 Business Pulse Survey (BPS) dataset that was collected by the World Bank Group during the pandemic. The dataset contains indicators about the impact of the pandemic and its associated lockdowns on business operations and public assistance across countries. For details, click here!)

 On ELE, download the dataset with filename “bps.csv” with the following variables:

                                                  i.      country – name of the country (e.g., Algeria)

                                                ii.      region – regional group (e.g., South Asia)

                                              iii.      income – income group (e.g., low income)

                                              iv.      gdp_pc = GDP per capita (in current US$)

                                                v.      access – share of establishments that received or expect to receive public assistance in the near future (%)

                                              vi.      dropsales – share of establishments with decreased monthly sales year before the interview (%)

                                            vii.      use_digital – share of firms that started or increased the use of digital platforms (%)

 

 

 

Question 1:

Part a.  

i.       Set the seed equal to your candidate number. (2 marks)

ii.      Then, take a random sample of 50 countries from the data and save this new dataset as “data2”. Use na.omit() across the data frame to remove rows containing missing or NA values. (3 marks)

iii.     Using this new dataset, plot the distribution of the share of establishments with decreased monthly sales year before the interview (dropsales). Label the plot properly. (5 marks)

Iv.     In 5 brief sentences, describe the distribution of dropsales. (5 marks)

 

You should continue to use your data2 dataset for all remaining parts of question 1.

 

Part b.    

i.      Estimate a model explaining the variation in dropsales as a function of the share of establishments that received or expect to receive public assistance in the near future. (5 marks)

ii.     Interpret the coefficients and measures of fit. (7 marks)

Part c.    

i.      What is the smallest value of dropsales that can be predicted by the estimated regression equation in (b)? Explain how you arrived at your answer. (4 marks)

ii.     Calculate the average prediction error (that is, the average distance between the data points and the estimated regression line) in predicting dropsales when access is less than or equal to 10. Explain how you arrived at your answer. (5 marks)

iii.    How many observations in the data have both dropsales = 0 and access = 0 simultaneously? Explain how you arrived at your answer. (5 marks)

iv.   Now, suppose an average country decided to increase the coverage of its public assistance (i.e., access) by 35%. Based on the estimated regression function in (b), calculate the predicted change in dropsales. (HINT: You may need to first drop observations with missing or "NA" values in them to make calculations, you can do this using the na.omit() function). (4 marks) 

 

Part d.    Explain why the estimated regression model in (b) may or may not be well suited for determining whether the provision of public support can help protect businesses from experiencing large declines in sales in a particular country. (In this question, you can use figures or calculation to strengthen your argument). (5 marks)

 

Total 45 marks - Enter your answers in the text box below and write your code in a single R script to be uploaded at the end of the assignment.

 

 

Question 2:

a. Create a new data frame called “data3” containing dropsalesaccess and income from data2.  (3 marks)

b. Use na.omit() across the data frame to remove rows containing missing or NA values. How many observations remain in your data frame? (3 marks)

c. Next, create a variable called “highinc” that is equal to 1 when the country belongs to the high-income group and 0 otherwise. Append this variable into the dataset “data3”. (3 marks)

d. Calculate and report how many countries in your sample belong to the high-income group and how many do not. (4 marks)

e.     Using this new dataset, conduct the following exercises (15 marks):

i.    Estimate the following regression model:

dropsalesi=α+β1accessi+β2highinci+uidropsalesi=α+β1accessi+β2highinci+ui

Store the coefficients. 

ii.   Regress dropsales on a constant and the highinc dummy variable. Calculate the residuals and append them in the dataset using variable name res1

iii.  Regress the access on a constant and the highinc dummy variable. Calculate the residuals from this regression and append them in the dataset using variable name res2.

iv.  Estimate the following regression model: 

res1i=α~+β~1res2+u~ires1i=α~+β~1res2+u~i

v.   Compare and contrast your estimates forβ1β1andβ~1β~1. What do you observe? Explain the intuition behind your observation using the OLS properties and assumptions you have learned in class.

f.    Using this same dataset in (a) dataset, conduct the following exercises:

i. Calculate the mean difference and the standard error of the mean difference of dropsales between high-income and non-high-income countries. Then, test the hypothesis that the mean dropsales for the two groups of countries is not equal at 5% significance level. (Hint: Make sure you formulate your hypothesis). 

What is the p-value? What is your conclusion. (10 marks)

ii. Estimate the following model:

dropsalesi=α+β2highinci+uidropsalesi=α+β2highinci+ui

where the included variables are as previously defined.  (2 marks)

iii.    Compare and contrast your estimated βiβi from the estimated mean difference of dropsales for the two groups of countries. What do you observe? Provide an intuition behind your observation. (5 marks) 

iv.   Compare and contrast your estimated standard error for βiβi from the standard error of the estimated mean difference of dropsales for the two groups of countries. What do you observe? Provide an intuition behind your observation. (5 marks) 

 

Total 50 marks - Enter your answers in the text box below and write your code in a single R script to be uploaded at the end of the assignment.