Premium
Inferring the outcomes of rejected loans: an application of semisupervised clustering
Author(s) -
Li Zhiyong,
Hu Xinyi,
Li Ke,
Zhou Fanyin,
Shen Feng
Publication year - 2020
Publication title -
journal of the royal statistical society: series a (statistics in society)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.103
H-Index - 84
eISSN - 1467-985X
pISSN - 0964-1998
DOI - 10.1111/rssa.12534
Subject(s) - inference , cluster analysis , computer science , categorical variable , artificial intelligence , machine learning , sample (material) , logistic regression , pattern recognition (psychology) , data mining , statistics , econometrics , mathematics , chemistry , chromatography
Summary Rejection inference aims to reduce sample bias and to improve model performance in credit scoring. We propose a semisupervised clustering approach as a new rejection inference technique. K ‐prototype clustering can deal with mixed types of numeric and categorical characteristics, which are common in consumer credit data. We identify homogeneous acceptances and rejections and assign labels to part of the rejections according to the label of acceptances. We test the performance of various rejection inference methods in logit, support vector machine and random‐forests models based on data sets of real consumer loans. The predictions of clustering rejection inference show advantages over other traditional rejection inference methods. Inferring the label of the rejection from semisupervised clustering is found to help to mitigate the sample bias problem and to improve the predictive accuracy.