Machine Learning Applications for Assessing P2P Portfolio Default Risk

George Troicky

Student thesis: Master thesis


Crowd funding has contributed to a rapid transformation of the financial sector over the recent years. Although it is not a new business model, it has gained drastic and steady traction over the last few years. It provides alternative access to funds for individuals, companies, and entrepreneurial ventures (Crowdfunding Explained, 2017). A peer-to-peer(P2P) lending, as one source type of crowdfunding, has quickly taken market share in the consumer loan market, attracting attention from corporate investors, institutional customers, regulatory parties, and rating agencies.
In 2004, a UK based lending platform called Zopa, has generated approximately 5 billion GBP in cumulative loans issued in the last 15 years. With Morningstar and Moody’s increasing their rating threshold, Zopa is the first P2P platform to achieve a AAA rating for their senior loans through a 245M GBP securitization program arranged by Deutsche Bank and partnered lenders (Krapf, 2018). With their claim of being able to originate high quality loans based on a proprietary model, it is not their first securitization of loans. Initially building a business models that diverted business away from conventional consumer lenders, P2P has grown to a size that is now converging with conventional institutional financing. With the growth a new sector raises new questions for methods on evaluating the sector, and more specifically the assets generated for financing on the secondary market. What value can machine learning (ML) provide to understanding securitized assets generated by P2P platforms? What variables are persistent drivers in predicting default rates? Will ML have predictive power in better understanding the factors of loan defaults from historical data? The research takes a positivistic approach in understanding the correlation of independent variables to the dependent variable. The results show that ML techniques can be applied with a disciplined approach to classify defaulted and completed loans. With the use of ML models, the drivers for predicting defaults included the sum of repaid interest and principle. This is consistent with industry standards, which project the highest rate of default for consumer loans falling within the first few months of a loan’s term.

EducationsMSc in Business Administration and E-business, (Graduate Programme) Final Thesis
Publication date2020
Number of pages96
SupervisorsRobert J. Kauffman