Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data and Policy Summer Scholar Program

Summer 2023

Capstone Project: Economic and Fiscal Policy

Title: Employee Attrition Analysis

Introduction:

Employee attrition is a critical concern in today's business landscape, with significant implications for organizations worldwide. High attrition rates result in increased costs, reduced productivity, and diminished morale. To address this issue, you must understand the underlying factors driving attrition and develop effective retention strategies. This capstone project focuses on analyzing the Employee Attrition dataset for a tech company, which provides comprehensive information on employee demographics, job roles, satisfaction levels, compensation, and work-life balance. By utilizing data analysis techniques, you can gain valuable insights into the factors influencing attrition, leading to the proposal of potential solutions.

The impact of high attrition rates extends beyond individual organizations and has significant economic consequences. Recruiting, hiring, and training new employees entail substantial costs, compounded by the loss of institutional knowledge. Attrition also disrupts productivity and hampers organizational growth, affecting economic performance at micro and macro levels. By exploring the factors contributing to attrition and developing effective retention strategies, this project contributes to the formulation of robust economic and fiscal policies that   foster stability, productivity, and sustainable growth within the business sector.

The capstone project focuses on developing your proficiency in key technical tools essential for data analysis and modeling. These tools encompass data cleaning and preprocessing techniques to ensure data quality. Additionally, you will gain expertise in analyzing data distributions through visualization, examining correlations, and conducting targeted analysis to extract meaningful business insights. Furthermore, the project involves constructing predictive models using the Attrition dataset, evaluating model performance, identifying the most significant features, and interpreting statistical terms to understand their practical implications.

These technical skills enable you to extract valuable insights from employee data and propose effective retention strategies that contribute to organizational stability and sustainable growth.

Completing this capstone project will result in a comprehensive understanding of employee attrition and its organizational impact. You will cultivate essential data analysis skills, enabling you to extract valuable insights from employee data. By proposing effective retention  strategies, the project will contribute to the formulation of sound economic and fiscal policies that enhance organizational stability and foster sustainable growth. Ultimately, the outcomes of this project will empower businesses by providing you with actionable insights to improve employee retention, enhance productivity, and cultivate a positive work environment.

Questions & Tasks:

Question 1: Data Overview ( 10 points) (3+3+4 = 10 pts)

1)   Examine the data's shape and identify any missing values in each column.

2)   Apply appropriate data-cleaning techniques to remove anomaly columns.(columns which only have one distinct value)

3)   Consider converting data types as needed. Note that some numerical features should be treated as categorical variables.

Question 2: Data Distribution Analysis (20 points) (5 points each)

1)   Analyze and visualize the distributions of both numerical and categorical features in the dataset using suitable plots such as histograms, bar plots, or pie charts.

2)   Examine the correlation between numerical features and identify the variables with the highest correlation. Provide insights into the reasons for this correlation.

3)   Draw meaningful conclusions from the insights generated for the categorical features.

4)   Explore and visualize the distribution of the response variable 'Attrition.' Do you think the dataset is balanced or not?

Question 3: Targeted Analysis (20 points) (6+6+8 = 20 pts)

1)   Determine which job role has the highest percentage of attrition and provide an explanation. Visualize your results.

2)   Investigate gender disparities in monthly income. Visualize your results.

3)   Compare attrition amounts or rates among different education levels and visualize the results. Visualize your results.

Question 4: Model Construction (40 points) (6+8+9+5+ 12 = 40 pts)

1)   Convert categorical variables into dummy variables to prepare the data for modeling.

2)   Conduct a linear probability model or logistic regression model with 'Attrition' as the response variable.

3)   Evaluate the model's performance using appropriate evaluation metrics or visualizations.

4)   List the five most significant features based on the model evaluation results.

5)   Explain the meaning of 'Estimates,' 'Std Error,' 't-value,' and 'p-value' in the model summary table. Provide three examples related to three features.

Question 5: Brief Write-up ( 10 points) (2+2+3+3 = 10 pts)

Based on your analysis, write a brief summary (no more than 250 words):

1)   Identify the most influential factor contributing to attrition.

2)   Provide suggestions for the HR team to reduce the attrition rate based on your findings.

3)   Discuss any limitations of the project.

4)   Describe what you would do ifyou had more time to further analyze the data and improve the project.

Bonus Points: (5*2 = 10 pts) Note: This is optional

1)   Apply train-test split to evaluate the model's performance on the test data. Explain any variations observed.

2)   Apply appropriate methods to handle the imbalance in the response variable.

Additional Guidance:

Note: Some ofthe numericalfeatures in the dataset should be considered as categorical variables:

Education

Environm

entSatisfa

ction

JobInvolv

ement

JobSatisf

action

Performan ceRating

Relations

hipSatisf action

WorkLife Balance

1

Below

College

Low

Low

Low

Low

Low

Bad

2

College

Medium

Medium

Medium

Good

Medium

Good

3

Bachelor

High

High

High

Excellent

High

Better

4

Master

Very High

Very High

Very High

Outstandin

g

Very High

Best

5

Doctor

Please ensure that you knityour R scriptfile (in HTML or PDFformat) before submission. The submitted file should include your R script and the knitted output. Ifyou prefer, you may submit a separate document in PDFformatfor your briefwrite-up.