TY - UNPB

T1 - Credit Scoring

T2 - Discussion of Methods and a Case Study

AU - Kronborg, Dorte

AU - Tjur, Tue

AU - Vincents, Bo

PY - 1999

Y1 - 1999

N2 - The scenario considered is that of a credit association, a bank or another financial institution which, on the basis of information about a new potential customer and historical data on many other customers, has to decide whether or not to give that customer a certain loan. We discuss three popular techniques: logistic regression, discriminant analysis and neural networks. We shall argue strongly in favour of the logistic regression. Discriminant analysis can be used, and for reasons that can be explained mathematically it will often result in approximately the same conclusions as a logistic regression. But the statistical assumptions are not appropriate in most cases, and the results given are not as directly interpretable as those of logistic regression. Neural network techniques, in their simplest form, suffer from the lack of statistical standard methods for verification of the model and tests for removal of covariates. This problem disappears to some extend when the neural networks are reformulated as proper statistical models, based on the type of functions that are considered in neural networks. But this results in a somewhat specialized class of non{linear regression models, which may be useful in situations where local peculiarities of the response function are in focus, but certainly not when the overall - usually monotone - effect of many more or less confounded covariates is the issue. We discuss, within the logistic regression framework, the handling of phenomena such as time trends and corruption of the historical data due to shifts of policy, censoring and/or interventions in highrisk customers' economy. Finally, we illustrate and support the theoretical considerations by a case study concerning mortgage loans in a Danish credit association.

AB - The scenario considered is that of a credit association, a bank or another financial institution which, on the basis of information about a new potential customer and historical data on many other customers, has to decide whether or not to give that customer a certain loan. We discuss three popular techniques: logistic regression, discriminant analysis and neural networks. We shall argue strongly in favour of the logistic regression. Discriminant analysis can be used, and for reasons that can be explained mathematically it will often result in approximately the same conclusions as a logistic regression. But the statistical assumptions are not appropriate in most cases, and the results given are not as directly interpretable as those of logistic regression. Neural network techniques, in their simplest form, suffer from the lack of statistical standard methods for verification of the model and tests for removal of covariates. This problem disappears to some extend when the neural networks are reformulated as proper statistical models, based on the type of functions that are considered in neural networks. But this results in a somewhat specialized class of non{linear regression models, which may be useful in situations where local peculiarities of the response function are in focus, but certainly not when the overall - usually monotone - effect of many more or less confounded covariates is the issue. We discuss, within the logistic regression framework, the handling of phenomena such as time trends and corruption of the historical data due to shifts of policy, censoring and/or interventions in highrisk customers' economy. Finally, we illustrate and support the theoretical considerations by a case study concerning mortgage loans in a Danish credit association.

KW - Credit scoring

KW - Discriminant analysis

KW - Logistic regression

KW - Neural networks

KW - Event history analysis

M3 - Working paper

T3 - Preprint

BT - Credit Scoring

PB - Center for Statistics

CY - Frederiksberg

ER -