Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning

Mikkel Alexander Harlev, Haohua Sun Yin, Klaus Christian Langenheldt, Raghava Rao Mukkamala, Ravi Vatrapu

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for reducing the anonymity of the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilised a sample of 434 entities with ≈ 200 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 10 categories. Our main finding is that we can indeed predict the type of a yet-identified entity. Using the Gradient Boosting algorithm, we achieve an accuracy of 77% and F1-score of ≈ 0.75. We discuss our novel approach of Supervised Machine Learning for uncovering Blockchain anonymity and its potential applications to forensics and financial compliance and its societal implications, outline study limitations and propose future research directions.
Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for reducing the anonymity of the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilised a sample of 434 entities with ≈ 200 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 10 categories. Our main finding is that we can indeed predict the type of a yet-identified entity. Using the Gradient Boosting algorithm, we achieve an accuracy of 77% and F1-score of ≈ 0.75. We discuss our novel approach of Supervised Machine Learning for uncovering Blockchain anonymity and its potential applications to forensics and financial compliance and its societal implications, outline study limitations and propose future research directions.
LanguageEnglish
Title of host publicationProceedings of the 51st Hawaii International Conference on System Sciences 2018
Number of pages10
Place of PublicationHonolulu
PublisherHawaii International Conference on System Sciences (HICSS)
Date2018
Pages3497-3506
ISBN (Print)9780998133119
StatePublished - 2018
EventThe 51st Hawaii International Conference on System Sciences. HICSS 2018 - Waikoloa Village, United States
Duration: 3 Jan 20186 Jan 2018
Conference number: 51
http://www.urbanccd.org/events/2018/1/3/hawaii-international-conference-on-system-sciences-hicss-51

Conference

ConferenceThe 51st Hawaii International Conference on System Sciences. HICSS 2018
Number51
CountryUnited States
CityWaikoloa Village
Period03/01/201806/01/2018
Internet address
SeriesProceedings of the Annual Hawaii International Conference on System Sciences
ISSN1060-3425

Keywords

  • Distributed ledger technology
  • The Blockchain
  • Bitcoin Blockchain
  • Supervised machine learning
  • Classification
  • De-anonymization
  • Entity identification

Cite this

Harlev, M. A., Sun Yin, H., Langenheldt, K. C., Mukkamala, R. R., & Vatrapu, R. (2018). Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning. In Proceedings of the 51st Hawaii International Conference on System Sciences 2018 (pp. 3497-3506). Honolulu: Hawaii International Conference on System Sciences (HICSS). Proceedings of the Annual Hawaii International Conference on System Sciences
Harlev, Mikkel Alexander ; Sun Yin, Haohua ; Langenheldt, Klaus Christian ; Mukkamala, Raghava Rao ; Vatrapu, Ravi. / Breaking Bad : De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning. Proceedings of the 51st Hawaii International Conference on System Sciences 2018. Honolulu : Hawaii International Conference on System Sciences (HICSS), 2018. pp. 3497-3506 (Proceedings of the Annual Hawaii International Conference on System Sciences).
@inproceedings{13551f74ae5f45659a8fdb55852ee01f,
title = "Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning",
abstract = "Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for reducing the anonymity of the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilised a sample of 434 entities with {\^a}‰ˆ 200 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 10 categories. Our main finding is that we can indeed predict the type of a yet-identified entity. Using the Gradient Boosting algorithm, we achieve an accuracy of 77{\%} and F1-score of {\^a}‰ˆ 0.75. We discuss our novel approach of Supervised Machine Learning for uncovering Blockchain anonymity and its potential applications to forensics and financial compliance and its societal implications, outline study limitations and propose future research directions.",
keywords = "Distributed ledger technology, The Blockchain, Bitcoin Blockchain, Supervised machine learning, Classification, De-anonymization, Entity identification, Distributed ledger technology, The Blockchain, Bitcoin Blockchain, Supervised machine learning, Classification, De-anonymization, Entity identification",
author = "Harlev, {Mikkel Alexander} and {Sun Yin}, Haohua and Langenheldt, {Klaus Christian} and Mukkamala, {Raghava Rao} and Ravi Vatrapu",
year = "2018",
language = "English",
isbn = "9780998133119",
pages = "3497--3506",
booktitle = "Proceedings of the 51st Hawaii International Conference on System Sciences 2018",
publisher = "Hawaii International Conference on System Sciences (HICSS)",
address = "United States",

}

Harlev, MA, Sun Yin, H, Langenheldt, KC, Mukkamala, RR & Vatrapu, R 2018, Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning. in Proceedings of the 51st Hawaii International Conference on System Sciences 2018. Hawaii International Conference on System Sciences (HICSS), Honolulu, Proceedings of the Annual Hawaii International Conference on System Sciences, pp. 3497-3506, Waikoloa Village, United States, 03/01/2018.

Breaking Bad : De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning. / Harlev, Mikkel Alexander; Sun Yin, Haohua; Langenheldt, Klaus Christian; Mukkamala, Raghava Rao; Vatrapu, Ravi.

Proceedings of the 51st Hawaii International Conference on System Sciences 2018. Honolulu : Hawaii International Conference on System Sciences (HICSS), 2018. p. 3497-3506.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Breaking Bad

T2 - De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning

AU - Harlev,Mikkel Alexander

AU - Sun Yin,Haohua

AU - Langenheldt,Klaus Christian

AU - Mukkamala,Raghava Rao

AU - Vatrapu,Ravi

PY - 2018

Y1 - 2018

N2 - Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for reducing the anonymity of the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilised a sample of 434 entities with ≈ 200 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 10 categories. Our main finding is that we can indeed predict the type of a yet-identified entity. Using the Gradient Boosting algorithm, we achieve an accuracy of 77% and F1-score of ≈ 0.75. We discuss our novel approach of Supervised Machine Learning for uncovering Blockchain anonymity and its potential applications to forensics and financial compliance and its societal implications, outline study limitations and propose future research directions.

AB - Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for reducing the anonymity of the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilised a sample of 434 entities with ≈ 200 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 10 categories. Our main finding is that we can indeed predict the type of a yet-identified entity. Using the Gradient Boosting algorithm, we achieve an accuracy of 77% and F1-score of ≈ 0.75. We discuss our novel approach of Supervised Machine Learning for uncovering Blockchain anonymity and its potential applications to forensics and financial compliance and its societal implications, outline study limitations and propose future research directions.

KW - Distributed ledger technology

KW - The Blockchain

KW - Bitcoin Blockchain

KW - Supervised machine learning

KW - Classification

KW - De-anonymization

KW - Entity identification

KW - Distributed ledger technology

KW - The Blockchain

KW - Bitcoin Blockchain

KW - Supervised machine learning

KW - Classification

KW - De-anonymization

KW - Entity identification

M3 - Article in proceedings

SN - 9780998133119

SP - 3497

EP - 3506

BT - Proceedings of the 51st Hawaii International Conference on System Sciences 2018

PB - Hawaii International Conference on System Sciences (HICSS)

CY - Honolulu

ER -

Harlev MA, Sun Yin H, Langenheldt KC, Mukkamala RR, Vatrapu R. Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning. In Proceedings of the 51st Hawaii International Conference on System Sciences 2018. Honolulu: Hawaii International Conference on System Sciences (HICSS). 2018. p. 3497-3506. (Proceedings of the Annual Hawaii International Conference on System Sciences).