Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COURSEWORK BRIEF:

Module Code:

MANG6556

Assessment:

Supplementary Coursework

Weighting:

100%

Module Title:

Credit Risk & Data Analytics

Coursework Brief:

Question 1 (60 marks)

The data set HMEQ reports characteristics and delinquency information for 5,960 home equity loans. A home equity loan is a loan where the obligor uses the equity of his or her home as the underlying collateral. The data set has the following characteristics:

BAD: 1 = applicant defaulted on loan or seriously delinquent; 0 = applicant paid loan

LOAN: Amount of the loan request

MORTDUE: Amount due on existing mortgage

VALUE: Value of current property

REASON: DebtCon = debt consolidation; HomeImp = home improvement

JOB: Occupational categories

YOJ: Years at present job

DEROG: Number of major derogatory reports

DELINQ: Number of delinquent credit lines

CLAGE: Age of oldest credit line in months

NINQ: Number of recent credit inquiries

CLNO: Number of credit lines

DEBTINC: Debt-to-income ratio

1.1 Carefully pre-process the data set by considering the following activities (30 marks):

exploratory data analysis

missing value handling (if any)

outlier detection and treatment (if any)

categorisation of the continuous variables (if deemed useful)

coding the nominal variables using Weights of Evidence (note that some additional coarse classification might be needed).

splitting the data set into a training and test set.

1.2 Estimate a scorecard using a logistic regression classifier and report the following (30 marks):

The most important variables

The impact of the variables on the target

The performance of the model. Use various performance metrics and discuss their relationship if any.

Compare this scorecard with the results of a Random Forest. Discuss your results.

Why do must banks use Logistic Regression as their base classifier? What do banks win and lose by doing this?

Please carefully report the various steps of your methodology and discuss your results in a rigorous way!

NOTE: It is unlikely that different students will come up with the exact same parameter estimates. Special consideration will be given to submissions whose estimates are identical.

Question 2 (20 marks)

Find an academic paper published in 2019 or later (based on online or print publication date) discussing a real-life application of credit risk or data analytics. It is important that the dataset analysed in the paper consists of real-life (not artificial) data. The publication outlets in which to look for a suitable paper are:

Management Science

Operations Research

INFORMS Journal on Computing

INFORMS Journal on Applied Analytics

Journal of Machine Learning Research

European Journal of Operational Research

Production and Operations Management

Manufacturing & Service Operations Management

ICDM (The IEEE International Conference on Data Mining)

NeurlPS (Conference on Neural Information Processing Systems)

KDD (ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

The other journals which are not on the list are not acceptable.

Once you have found an appropriate paper, report the following in separate subsections:

Title, authors, and complete citation (e.g., journal name, volume/issue, year, …)

The data mining problem considered

The data mining techniques used

The results reported

A critical discussion of the model and results (assumptions made, shortcomings, limitations, …).

Make sure you demonstrate that you understand what the article is all about and are able to provide a critical discussion.

Do not copy and paste from the article. Using Turnitin, this will be easily detected!

Please do NOT review the same paper as you did