Combining Machine Learning and Forecasting to Predict Real Estate Prices: An Experimental Study of Machine Learning and Forecasting Methods Applied to Danish Real Estate Sales Data

Jakob Pallesen & Julius Varlid Bech

Student thesis: Master thesis

Abstract

This master’s thesis is concerned with machine learning and forecasting for price prediction in the Danish real estate market. The primary aim of this thesis is to investigate how machine learning can be combined with forecasting methods in order to make more accurate real estate price predictions. The secondary aims of this thesis are to investigate how big data as a concept can assist when combining machine learning with forecasting as well as investigate what challenges needs to be addressed in order to gain the full potential of machine learning in the real estate market.
Operating within the concepts of big data, machine learning and forecasting, we have collected publicly available data in order to conduct an experiment where we build house price predicting model and compare their accuracy. We have used a price index model as our forecasting model, and combined it with our machine learning models by extrapolating previous sales data. The machine learning algorithms used are boosted decision tree regression and decision forest regression. Our results show that combining our forecasting model with any of our machine learning models did not yield a higher accuracy in real estate price prediction. Our machine learning model, using an iteratively tuned boosted decision tree regression showed the highest model fit without being combined with the data extrapolated by the forecasting model. As recommendations, we present how machine learning can be used to predict the price of a house and what challenges to be aware of.
We conclude that the combination of the two models used does not yield higher accuracy when combined by extrapolating historical sales data. Further, we conclude that models where one is far superior in prediction accuracy than the other is not combined. However, we will recommend what can be done next, in order to overcome these challenges and create combined predictive models with a higher accuracy.

EducationsMSc in Business Administration and Information Systems, (Graduate Programme) Final Thesis
LanguageEnglish
Publication date2018
Number of pages122
SupervisorsRaghava Rao Mukkamala