The Relationship between Online Factors and Prices of Cryptocurrencies

Krzysztof Koszewski

Studenteropgave: Kandidatafhandlinger


Cryptocurrencies are volatile assets that receive much attention in the media and academic literature. This paper focuses on the relationship between online factors from Twitter, Reddit, and Wikipedia and the five biggest cryptocurrencies by market capitalization: Bitcoin, Ether, Binance, Tether, and USD Coin. The analysis has three significant steps. The first step consists of performing a 1,2- and 3-day price prediction analysis, with the models including only historical prices. The following six models were chosen: Time Series models: ARIMA and SARIMA; and Machine Learning models, RNN, LSTM, Bi-LSTM, and GRU. The results showed that the machine learning models had better accuracy than the time series models. The second step encompasses the extraction of online data from Twitter, Reddit, and Wikipedia and the following analysis: performing the sentiment analysis with VADER, choosing the most important online variables through correlation analysis, and the Random Forest feature extraction. The results showed that Twitter variables were more correlated, and the Random Forest algorithm gave them more importance. The third step of the analysis consists of extending the Machine Learning models from the first step by adding the online variables. The results showed that despite being less correlated and given less importance by the Random Forest feature extraction, Reddit variables had the best price prediction results. The positive, negative and neutral sentiment variables were equally successful at producing great predictions for all five cryptocurrencies. However, the non-sentiment variables from Twitter and Reddit, referred to as engagement metrics, delivered equal or better predictions than the VADER sentiments. In addition, the Machine Learning models were run exclusively for the Covid-19 period. The results had worse accuracy for the models, which included only historical prices. However, the online variables offered more significant improvements; the results were also more consistent within each machine learning algorithm.

UddannelserCand.merc.dat Erhvervsøkonomi og Datalogi, (Kandidatuddannelse) Afsluttende afhandling
Antal sider78
VejledereSomnath Mazumdar