Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework Assignment 1

Data Management and R Basics

(Due January 26, 2024)

In this homework assignment you will be using data from the International Centre for Tax and Development’s Government Revenue Dataset (GDR) and the World Bank’s Poverty and Inequality databases. me goals of this assignment are to get familiar with some basic data manipulations in R and to explore the relationships between poverty, inequality, and government’s revenue-collection eforts.

You have been given Excel spreadsheets with data downloaded directly from the source sites (the most recent ICDT data is here; the latest WB poverty database is here), along with some information about the GDR indicators. To make your work a bit easier, I have harmonized some of the names.  (As you do this exercise, think about why this matters.) You should look over the GDR spreadsheet to understand how the variables are constructed. In addition, feel free to look over the reports in the GDR website to see what has been done with these data to date.

As with all assignments this quarter, you are required to submit both a writeup and a .R script ile by the due date. me writeup is the report which answers all of the questions and contains any requested plots and calculations. You should think of it as the kind of document you would submit to a supervisor.  It should look professional, your answers should be concise, and you would be proud to put it in a portfolio of your work. me .Rile is the script that you create that inputs the data, conducts all analysis, and generates all of the output included in the writeup. mis ile is the back-end’ to your report. It should run without error, should not overwrite any of the original data, and should create everything necessary to reproduce your indings transparently. Although we will not grade your .R script ile on its appearance, you should also consider what it looks like. It should be organized, well commented, and again something you would be proud to display in a portfolio of your work. Ideally, anyone who know how to use R should be able to understand what each line of your code means and why it is in the .R script ile by just reading through it. Please note: you should have no direct R output copied/pasted into your writeup (i.e., you need to interpret results and present them in a more meaningful way than pages of code output).

1. The Basics (20pts)

Create a new .R script ile (“LastName_PID_HW1.R” – e.g., “Garias_12345678_HW1.R”; no black spaces in the script name).  Your script should generate all of the output, tables, and graphics used in your written submission and needs to run on its own, fully, without errors, to get full credit. You should assume that we will run your .R ile in the same directory as the original data iles; do not assume that the data is already loaded. For this irst assignment, only use base R or tidyverse packages.

•  Script (.R ile) named correctly and runs without errors. (10pts)

•  Script (.R ile) does not overwrite original data, does the requisite analysis, and outputs any igures or tables used in your writeup (labeled correctly, and saved with ilenames that include your last name and PID). (10pts)

2. Getting to Know the Data (20pts)

Start your .R ile by loading the necessary packages: tidyverse and readxl.

1.  Using the read_xlsx() or read_excel() functions, read in the GDR data for 2014 from the second sheet in the original spreadsheet, and name it‘dat1 What are the dimensions of this object, dat1? What is the unit of observation? What are the measurement units of the tax variables? (1 sentence; 2pts)

2.  What is the average tax revenue (excluding social contributions) for 2014?  What percentage of the sample with complete information in terms of trade taxes and income tax revenue collects no trade taxes but reports higher-than-average income tax revenue? (1 sentence; 3pts)

3.  Generate two plots. In the irst, include the bottom 10 countries in terms of total tax revenue (excluding social contributions), and show their direct tax revenue (excluding social contributions and resource revenue), and their indirect tax revenue.  Generate a second plot for the top 10 countries in terms of total tax revenue (excluding social contributions). (2 plots; 5pts)

4. Explain what relationship emerges in these plots. (1 sentence describing the relationship; 5pts)

5.  Generate a presentable density plot of resource taxes, income taxes, and taxes on goods and services. What conclusions do you draw from these distributions? (1 plot and 1 short paragraph; 5pts)

3. Data Manipulation & Descriptive Statistics (30pts)

1.  Now read in the data from World Bank, and name it‘dat2 You may need to adjust the command to get what you want - take a look at the excel version so you can understand what the data loading is doing. What are the dimensions of this object, dat2? What is the unit of observation? (1 sentence; 1pts)

2.  Using lei join(), merge dat1 with dat2 and assign that to a new object,‘GDRandWB’, keeping all ob- servations from dat2.  (You may need to look at the column names and types carefully to do this cor- rectly; i.e., by retaining only one copy of any redundant information.) Write a sentence that describes in words what you accomplished here, and give the dimensions of‘GDRandWB (Verify that the resulting merged data is similar to the provided“MergedGDRandWB.R”ile.) (1 sentence; 3pts)

3.  What is the mean GINI coefficient? How many countries have GINIs greater than one standard devi- ation above the mean? (Note: You will need to enclose variable names that contain spaces with single angled quotation marks,‘like this‘.) (1 sentence; 1pt)

4.  Create a presentable scatterplot that shows the relationship between the share of income held by the top 10% and the share of income held by the bottom 10% across the globe. Highlight (and label by name) points with GINI coefficients greater than 2 standard deviations above the global mean. Highlight and label Brazil, China, and the United States. Summarize your observations. (1 sentence and 1 plot; 5pts)

5.  Generate a variable that indicates whether a given observation has a high (above the mean) or low (below the mean) GINI coefficient. Next, create a small table showing the mean value (global), as well as the mean value for the high- and low- GINI groups, and an indicator of whether they are statistically diferent at 95%, of each of the following tax variables: total tax revenue (excl.  social contributions); direct taxes (excl.  social contributions); income taxes; indirect tax revenue; taxes on trade; taxes on goods and services. We will learn how to automatically output tables in R later in the quarter; for now, you can create a presentable Excel table manually. Summarize your indings in a concise paragraph. (1 table and 1 short paragraph; 20pts)

4. Regression Analysis (25pts)

1.  Estimate a regression equation in which the GINI is the outcome variable and total tax revenue (exclud- ing social contributions) is a regressor (i.e., regress GINI on total tax revenue.) men estimate a similar regression equation using the percent of individuals living at under s1.25 per person per day (PPP, the international extreme poverty line) as the dependent variable.  Interpret your results. (2  sentences; 5pts)

2.  Create a visualization (2 igures) that shows the two linear its you just estimated. Label axes clearly. (2 igures; 5pts)

3.  Now, regress the GINI coefficient and then extreme poverty on each the following tax variables: total tax revenue (excl. social contributions); direct taxes (excl. social contributions); income taxes; indirect tax revenue; taxes on trade; taxes on goods and services.  Summarize your indings in a concise table (Excel ifine here as well), with one paragraph describing your indings. (1 table and 1 paragraph; 15pts)

5. Overall Presentation (5pts)

Your overall presentation style matters tremendously. You should not have any raw R output or code copied/pasted into your inal document. All questions should be answered clearly and concisely in full sentences. Plots, ta-bles, etc., should be clearly labeled and referenced appropriately in your writeup.