A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning

Haohua Sun Yin, Ravi Vatrapu

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting
Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting
LanguageEnglish
Title of host publicationProceedings. 2017 IEEE International Conference on Big Data : IEEE Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Number of pages10
Place of PublicationLos Alamitos, CA
PublisherIEEE
Date2017
Pages3690-3699
ISBN (Print)9781538627167
ISBN (Electronic)9781538627150, 9781538627143
DOIs
StatePublished - 2017
Event5th IEEE International Conference on Big Data. 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017
Conference number: 5
http://cci.drexel.edu/bigdata/bigdata2017/

Conference

Conference5th IEEE International Conference on Big Data. 2017
Number5
CountryUnited States
CityBoston
Period11/12/201714/12/2017
Internet address

Keywords

  • Bitcoin
  • Blockchain
  • Cryptocurrency
  • Ecosystem
  • Cybercrime
  • Machine Learning
  • Supervised Learning
  • Ransomware

Cite this

Sun Yin, H., & Vatrapu, R. (2017). A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. In J-Y. Nie, Z. Obradovic, T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, H. Zang, R. Baeza-Yates, X. Hu, J. Kepner, A. Cuzzocrea, J. Tang, ... M. Toyoda (Eds.), Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017 (pp. 3690-3699). Los Alamitos, CA: IEEE. DOI: 10.1109/BigData.2017.8258365
Sun Yin, Haohua ; Vatrapu, Ravi. / A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. editor / Jian-Yun Nie ; Zoran Obradovic ; Toyotaro Suzumura ; Rumi Ghosh ; Raghunath Nambiar ; Chonggang Wang ; Hui Zang ; Ricardo Baeza-Yates ; Xiaohua Hu ; Jeremy Kepner ; Alfredo Cuzzocrea ; Jian Tang ; Masashi Toyoda. Los Alamitos, CA : IEEE, 2017. pp. 3690-3699
@inproceedings{1749fd6ea14c4959a264cfef8cd15ad7,
title = "A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning",
abstract = "Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38{\%}, 76.47{\%}, 78.46{\%}, 80.76{\%} respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81{\%} according to Bagging, and10.95{\%} according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79{\%} and 10.02{\%} according to Bagging; and 3.16{\%} and1.45{\%} according to Gradient Boosting",
keywords = "Bitcoin, Blockchain, Cryptocurrency, Ecosystem, Cybercrime, Machine Learning, Supervised Learning, Ransomware, Bitcoin, Blockchain, Cryptocurrency, Ecosystem, Cybercrime, Machine Learning, Supervised Learning, Ransomware",
author = "{Sun Yin}, Haohua and Ravi Vatrapu",
year = "2017",
doi = "10.1109/BigData.2017.8258365",
language = "English",
isbn = "9781538627167",
pages = "3690--3699",
editor = "Jian-Yun Nie and Zoran Obradovic and Toyotaro Suzumura and Rumi Ghosh and Raghunath Nambiar and Chonggang Wang and Hui Zang and Ricardo Baeza-Yates and Xiaohua Hu and Jeremy Kepner and Alfredo Cuzzocrea and Jian Tang and Masashi Toyoda",
booktitle = "Proceedings. 2017 IEEE International Conference on Big Data",
publisher = "IEEE",
address = "United States",

}

Sun Yin, H & Vatrapu, R 2017, A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. in J-Y Nie, Z Obradovic, T Suzumura, R Ghosh, R Nambiar, C Wang, H Zang, R Baeza-Yates, X Hu, J Kepner, A Cuzzocrea, J Tang & M Toyoda (eds), Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. IEEE, Los Alamitos, CA, pp. 3690-3699, Boston, United States, 11/12/2017. DOI: 10.1109/BigData.2017.8258365

A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. / Sun Yin, Haohua; Vatrapu, Ravi.

Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. ed. / Jian-Yun Nie; Zoran Obradovic; Toyotaro Suzumura; Rumi Ghosh; Raghunath Nambiar; Chonggang Wang; Hui Zang; Ricardo Baeza-Yates; Xiaohua Hu; Jeremy Kepner; Alfredo Cuzzocrea; Jian Tang; Masashi Toyoda. Los Alamitos, CA : IEEE, 2017. p. 3690-3699.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning

AU - Sun Yin,Haohua

AU - Vatrapu,Ravi

PY - 2017

Y1 - 2017

N2 - Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting

AB - Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting

KW - Bitcoin

KW - Blockchain

KW - Cryptocurrency

KW - Ecosystem

KW - Cybercrime

KW - Machine Learning

KW - Supervised Learning

KW - Ransomware

KW - Bitcoin

KW - Blockchain

KW - Cryptocurrency

KW - Ecosystem

KW - Cybercrime

KW - Machine Learning

KW - Supervised Learning

KW - Ransomware

U2 - 10.1109/BigData.2017.8258365

DO - 10.1109/BigData.2017.8258365

M3 - Article in proceedings

SN - 9781538627167

SP - 3690

EP - 3699

BT - Proceedings. 2017 IEEE International Conference on Big Data

PB - IEEE

CY - Los Alamitos, CA

ER -

Sun Yin H, Vatrapu R. A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. In Nie J-Y, Obradovic Z, Suzumura T, Ghosh R, Nambiar R, Wang C, Zang H, Baeza-Yates R, Hu X, Kepner J, Cuzzocrea A, Tang J, Toyoda M, editors, Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. Los Alamitos, CA: IEEE. 2017. p. 3690-3699. Available from, DOI: 10.1109/BigData.2017.8258365