A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedings



Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting

Publication information

Original languageEnglish
Title of host publicationProceedings. 2017 IEEE International Conference on Big Data : IEEE Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Number of pages10
Place of PublicationLos Alamitos, CA
Publication date2017
ISBN (Print)9781538627167
ISBN (Electronic)9781538627150, 9781538627143
StatePublished - 2017
Event5th IEEE International Conference on Big Data. 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017
Conference number: 5


Conference5th IEEE International Conference on Big Data. 2017
LandUnited States

    Research areas

  • Bitcoin, Blockchain, Cryptocurrency, Ecosystem, Cybercrime, Machine Learning, Supervised Learning, Ransomware

ID: 55368965