Unravelling the Bitcoin Blockchain Ecosystem Using Supervised Learning

Hao Hua Sun Yin

Student thesis: Master thesis

Abstract

Bitcoin, a peer-to-peer payment system and digital currency, is often involved in illicit activities such as ransomware attacks, scams, illegal goods trading, and thievery. At the time of writing, the Bitcoin ecosystem has not yet been estimated, meaning that there is no picture of what types of services and how much of each is participating in it. Hence, the current research aims to answer the following questions: What types of entities can be found in the Bitcoin ecosystem? What percentage belongs to what types of entities? In order to do so, basic concepts on blockchains, Bitcoin, and Machine Learning were introduced, since Supervised Learning approaches are the main analysis tool. In the process, a dozen Supervised Learning classifiers were tested, of which four prevailed with a cross-validation accuracy of 76.88%, 75.47%, 78.91%, 80.32% respectively. From the top four classifiers, Bagging and Gradient Boosting classifiers were selected based on their weighted average and per class precision. The Supervised Learning models were trained on a dataset of categorised observations, involving a total of twelve different classes of entities, and had to predict the category of a set of 73,058 uncategorised entities. The predictions revealed that from the sample of uncategorised entities, the most abundant categories were other (26.90% or 30.55%) and personal wallet (27.45% or 36.93%), followed by gambling (12.50% or 13.24%), tor market (5.58% or 12.39%), and exchange (9.75% or 9.78%). The implications of the current research remain as a proof of concept of this approach, and it does not only provide a very first estimation of the Bitcoin ecosystem, but also pave the way for future work, where more resources and data is available, to finally uncover a larger (50%, 75%, etc) portion of the Bitcoin ecosystem. Keywords: Bitcoin, Blockchain, Cryptocurrency, Ecosystem, Cybercrime, Machine Learning, Supervised Learning, Multiclass Classification

EducationsMSc in Business Administration and Information Systems, (Graduate Programme) Final Thesis
LanguageEnglish
Publication date2017
Number of pages64
SupervisorsRavi Vatrapu