A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning

Haohua Sun Yin, Ravi Vatrapu

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

888 Downloads (Pure)

Resumé

Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting
OriginalsprogEngelsk
TitelProceedings. 2017 IEEE International Conference on Big Data : IEEE Big Data 2017
RedaktørerJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Antal sider10
Udgivelses stedLos Alamitos, CA
ForlagIEEE
Publikationsdato2017
Sider3690-3699
ISBN (Trykt)9781538627167
ISBN (Elektronisk)9781538627150, 9781538627143
DOI
StatusUdgivet - 2017
Begivenhed2017 IEEE International Conference on Big Data - Boston, USA
Varighed: 11 dec. 201714 dec. 2017
Konferencens nummer: 5
http://cci.drexel.edu/bigdata/bigdata2017/

Konference

Konference2017 IEEE International Conference on Big Data
Nummer5
LandUSA
ByBoston
Periode11/12/201714/12/2017
Internetadresse

Emneord

  • Bitcoin
  • Blockchain
  • Cryptocurrency
  • Ecosystem
  • Cybercrime
  • Machine Learning
  • Supervised Learning
  • Ransomware

Citer dette

Sun Yin, H., & Vatrapu, R. (2017). A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. I J-Y. Nie, Z. Obradovic, T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, H. Zang, R. Baeza-Yates, X. Hu, J. Kepner, A. Cuzzocrea, J. Tang, ... M. Toyoda (red.), Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017 (s. 3690-3699). Los Alamitos, CA: IEEE. https://doi.org/10.1109/BigData.2017.8258365
Sun Yin, Haohua ; Vatrapu, Ravi. / A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. red. / Jian-Yun Nie ; Zoran Obradovic ; Toyotaro Suzumura ; Rumi Ghosh ; Raghunath Nambiar ; Chonggang Wang ; Hui Zang ; Ricardo Baeza-Yates ; Xiaohua Hu ; Jeremy Kepner ; Alfredo Cuzzocrea ; Jian Tang ; Masashi Toyoda. Los Alamitos, CA : IEEE, 2017. s. 3690-3699
@inproceedings{1749fd6ea14c4959a264cfef8cd15ad7,
title = "A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning",
abstract = "Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38{\%}, 76.47{\%}, 78.46{\%}, 80.76{\%} respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81{\%} according to Bagging, and10.95{\%} according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79{\%} and 10.02{\%} according to Bagging; and 3.16{\%} and1.45{\%} according to Gradient Boosting",
keywords = "Bitcoin, Blockchain, Cryptocurrency, Ecosystem, Cybercrime, Machine Learning, Supervised Learning, Ransomware, Bitcoin, Blockchain, Cryptocurrency, Ecosystem, Cybercrime, Machine Learning, Supervised Learning, Ransomware",
author = "{Sun Yin}, Haohua and Ravi Vatrapu",
year = "2017",
doi = "10.1109/BigData.2017.8258365",
language = "English",
isbn = "9781538627167",
pages = "3690--3699",
editor = "Jian-Yun Nie and Zoran Obradovic and Toyotaro Suzumura and Rumi Ghosh and Raghunath Nambiar and Chonggang Wang and Hui Zang and Ricardo Baeza-Yates and Xiaohua Hu and Jeremy Kepner and Alfredo Cuzzocrea and Jian Tang and Masashi Toyoda",
booktitle = "Proceedings. 2017 IEEE International Conference on Big Data",
publisher = "IEEE",
address = "United States",

}

Sun Yin, H & Vatrapu, R 2017, A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. i J-Y Nie, Z Obradovic, T Suzumura, R Ghosh, R Nambiar, C Wang, H Zang, R Baeza-Yates, X Hu, J Kepner, A Cuzzocrea, J Tang & M Toyoda (red), Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. IEEE, Los Alamitos, CA, s. 3690-3699, 2017 IEEE International Conference on Big Data, Boston, USA, 11/12/2017. https://doi.org/10.1109/BigData.2017.8258365

A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. / Sun Yin, Haohua; Vatrapu, Ravi.

Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. red. / Jian-Yun Nie; Zoran Obradovic; Toyotaro Suzumura; Rumi Ghosh; Raghunath Nambiar; Chonggang Wang; Hui Zang; Ricardo Baeza-Yates; Xiaohua Hu; Jeremy Kepner; Alfredo Cuzzocrea; Jian Tang; Masashi Toyoda. Los Alamitos, CA : IEEE, 2017. s. 3690-3699.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning

AU - Sun Yin, Haohua

AU - Vatrapu, Ravi

PY - 2017

Y1 - 2017

N2 - Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting

AB - Bitcoin, a peer-to-peer payment system and digitalcurrency, is often involved in illicit activities such as scamming,ransomware attacks, illegal goods trading, and thievery. At thetime of writing, the Bitcoin ecosystem has not yet been mappedand as such there is no estimate of the share of illicit activities.This paper provides the first estimation of the portion of cybercriminalentities in the Bitcoin ecosystem. Our dataset consistsof 854 observations categorised into 12 classes (out of which5 are cybercrime-related) and a total of 100,000 uncategorisedobservations.The dataset was obtained from the data providerwho applied three types of clustering of Bitcoin transactions tocategorise entities: co-spend, intelligence-based, and behaviourbased.Thirteen supervised learning classifiers were then tested,of which four prevailed with a cross-validation accuracy of77.38%, 76.47%, 78.46%, 80.76% respectively. From the topfour classifiers, Bagging and Gradient Boosting classifiers wereselected based on their weighted average and per class precisionon the cybercrime-related categories. Both models were used toclassify 100,000 uncategorised entities, showing that the shareof cybercrime-related is 29.81% according to Bagging, and10.95% according to Gradient Boosting with number of entitiesas the metric. With regard to the number of addresses andcurrent coins held by this type of entities, the results are:5.79% and 10.02% according to Bagging; and 3.16% and1.45% according to Gradient Boosting

KW - Bitcoin

KW - Blockchain

KW - Cryptocurrency

KW - Ecosystem

KW - Cybercrime

KW - Machine Learning

KW - Supervised Learning

KW - Ransomware

KW - Bitcoin

KW - Blockchain

KW - Cryptocurrency

KW - Ecosystem

KW - Cybercrime

KW - Machine Learning

KW - Supervised Learning

KW - Ransomware

U2 - 10.1109/BigData.2017.8258365

DO - 10.1109/BigData.2017.8258365

M3 - Article in proceedings

SN - 9781538627167

SP - 3690

EP - 3699

BT - Proceedings. 2017 IEEE International Conference on Big Data

A2 - Nie, Jian-Yun

A2 - Obradovic, Zoran

A2 - Suzumura, Toyotaro

A2 - Ghosh, Rumi

A2 - Nambiar, Raghunath

A2 - Wang, Chonggang

A2 - Zang, Hui

A2 - Baeza-Yates, Ricardo

A2 - Hu, Xiaohua

A2 - Kepner, Jeremy

A2 - Cuzzocrea, Alfredo

A2 - Tang, Jian

A2 - Toyoda, Masashi

PB - IEEE

CY - Los Alamitos, CA

ER -

Sun Yin H, Vatrapu R. A First Estimation of the Proportion of Cybercriminal Entities in the Bitcoin Ecosystem using Supervised Machine Learning. I Nie J-Y, Obradovic Z, Suzumura T, Ghosh R, Nambiar R, Wang C, Zang H, Baeza-Yates R, Hu X, Kepner J, Cuzzocrea A, Tang J, Toyoda M, red., Proceedings. 2017 IEEE International Conference on Big Data: IEEE Big Data 2017. Los Alamitos, CA: IEEE. 2017. s. 3690-3699 https://doi.org/10.1109/BigData.2017.8258365