A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning

Haohua Sun Yin, Ravi Vatrapu

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

2053 Downloads (Pure)


Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting
Original languageEnglish
Title of host publicationProceedings. 2017 IEEE International Conference on Big Data : IEEE Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Number of pages10
Place of PublicationLos Alamitos, CA
Publication date2017
ISBN (Print)9781538627167
ISBN (Electronic)9781538627150, 9781538627143
Publication statusPublished - 2017
EventFifth IEEE International Conference on Big Data. IEEE BigData 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017
Conference number: 5


ConferenceFifth IEEE International Conference on Big Data. IEEE BigData 2017
Country/TerritoryUnited States
Internet address


  • Bitcoin
  • Blockchain
  • Cryptocurrency
  • Ecosystem
  • Cybercrime
  • Machine Learning
  • Supervised Learning
  • Ransomware

Cite this