Project description
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Project description
The following is a fictional case study designed to loosely resemble the work you might undertake on a future project. It will test your ability to handle big data and perform statistical/machine learning analyses as well as your ability to communicate your findings and derive commercial insight from your technical work.
You may perform the analyses using any computational language you wish (including at least one tool different from excel, since the majority of data sets we receive from clients are too large for us to be able to use it). Please submit your code along with your presentation and the requested results file by the date agreed with Gamma recruitment team.
Scenario:
Our client is a major utility company providing gas and electricity to corporate, SME and residential customers. In recent years, post-liberalization of the energy market in US, has had a growing problem with increasing customer defections above industry average. Thus, the client has asked us to work alongside them to identify the drivers of this problem and to devise and implement a strategy to counter it. The churn issue is most acute in the SME division and thus they want it to be the first priority. The head of the SME division has asked whether it is possible to predict the customers which are most likely to churn so that they can trial a range of pre-emptive actions. He has a hypothesis that clients are switching to cheaper providers so the first action to be trialed will be to offer customers with high propensity of churning a 20% discount.
Your task:
We have scheduled a meeting in one week's time with the head of the SME division in which you will present our findings of the churn issue and your recommendations on how to address it.
You are in charge of building the model and of suggesting which commercial actions should be taken as a result of the model's outcome.
The first stage is to establish the viability of such a model. For training your model you are provided with a dataset which includes features of SME customers in January 2016 as well as the information about whether or not they have churned by March 2016. In addition to that you have received the prices from 2015 for these customers. Of particular interest for the client is how you frame the problem for training. Given that this is the first time the client is resorting to predictive modelling, it is beneficial to leverage descriptive statistics and visualisation for extracting interesting insights from the provided data before diving into the model. Also while it is not mandatory, you are encouraged to test multiple algorithms. If you do so it will helpful to describe the tested algorithms in a simple manner.
Using the trained model you shall “score” customers in the verification data set (provided in the eponymous file) and put them in descending order of the propensity to churn. You should also classify these customers into two classes: those which you predict to churn are to be labelled "1" and the remaining customers should be labelled "0" in the result template.
You will submit this file with your presentation and your predictions will be scored with area under the ROC curve.
Finally, the client would like to have a view on whether the 20% discount offer to customers predicted to be churned is a good measure. Given that it is a steep discount bringing their price lower than all competitors we can assume for now that everyone who is offered will accept it. According to regulations they cannot raise the price of someone within a year if they accept the discount. Therefore offering it excessively is going to hit revenues hard.
Table 1 describes all the data fields which are found in the data. You will notice that the contents of some fields are meaningless text strings. This is due to "hashing" of text fields for data privacy. While their commercial interpretation is lost as a result of the hashing, they may still have predictive power.
2022-06-24