Can Publicly Available Non-Financial Data Significantly Improve Corporate Default Prediction Accuracy?

Andreas Riise Olczyk & Malthe Dyvekær Nybjerg

Student thesis: Master thesis


Huge amounts of financial and non-financial data has been made freely accessible to the general public by the Danish Business Register in an effort to digitize its operations and to share knowledge. It is the hope that the open data will create knowledge and opportunities for companies and ultimately create growth. The non-financial data is a record of all – recorded by the Central Business Register – non-financial attributes a company has had since its establishment. This paper seeks to test whether this non-financial data can add any significant explanatory power to corporate default prediction models. Data used in the analyses consists of 93.000 observations for Danish ApS and A/S companies in the period 2013-2017, of which 7.909 were defaulted. The test is done, first, by analyzing the financial and non-financial data separately using logistic regression. Subsequently, the datasets are combined in order to create a full model and investigate whether this is more accurate than the two separately. Furthermore, the paper takes a critical stance to the financial ratios used and how these can, and are being, manipulated and how this may affect the models. The principal findings in the paper clearly shows that the addition of the non-financial data adds significant accuracy. The combined model reaches an AUC score of 0,921 and log score of -0,1665 which is better than the strictly financial model with AUC of 0,876 and log score of -0,2164 and the strictly non-financial model with AUC of 0,698 and log score -0,2416. In addition, it is superior in any of the common measurements of model fit; log likelihood, log score, R2 ’s and classification ability. Though the analysis showed that the addition of non-financial data technically improved a corporate default prediction model, the extraction, modelling and analysis of the data was so complex that the usefulness of the non-financial data in the Central Business Register is limited to those with very high technical abilities and computational capacities

EducationsMSc in Accounting, Strategy and Control, (Graduate Programme) Final Thesis
Publication date2018
Number of pages149