Regulating Cryptocurrencies: A Supervised Machine Learning Approach to De-Anonymizing the Bitcoin Blockchain

Hao Hua Sun Yin, Klaus Langenheldt, Mikkel Harlev, Raghava Rao Mukkamala, Ravi Vatrapu

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with ≈385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of ≈79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.
Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with ≈385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of ≈79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.
LanguageEnglish
JournalJournal of Management Information Systems
Volume36
Issue number1
Pages37-73
Number of pages37
ISSN0742-1222
DOIs
StatePublished - 2019

Keywords

  • Cryptocurrencies
  • Bitcoin
  • Blockchain
  • Cybersecurity
  • Supervised machine learning
  • Online anonymity
  • Cybercrime

Cite this

@article{c9ca638e4e47403bb678c2091cf61c5b,
title = "Regulating Cryptocurrencies: A Supervised Machine Learning Approach to De-Anonymizing the Bitcoin Blockchain",
abstract = "Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with ≈385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42{\%} and F1-score of ≈79.64{\%}. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.",
keywords = "Cryptocurrencies, Bitcoin, Blockchain, Cybersecurity, Supervised machine learning, Online anonymity, Cybercrime, Cryptocurrencies, Bitcoin, Blockchain, Cybersecurity, Supervised machine learning, Online anonymity, Cybercrime",
author = "Yin, {Hao Hua Sun} and Klaus Langenheldt and Mikkel Harlev and Mukkamala, {Raghava Rao} and Ravi Vatrapu",
year = "2019",
doi = "10.1080/07421222.2018.1550550",
language = "English",
volume = "36",
pages = "37--73",
journal = "Journal of Management Information Systems",
issn = "0742-1222",
publisher = "Taylor & Francis",
number = "1",

}

Regulating Cryptocurrencies : A Supervised Machine Learning Approach to De-Anonymizing the Bitcoin Blockchain. / Yin, Hao Hua Sun; Langenheldt, Klaus; Harlev, Mikkel; Mukkamala, Raghava Rao; Vatrapu, Ravi.

In: Journal of Management Information Systems, Vol. 36, No. 1, 2019, p. 37-73.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Regulating Cryptocurrencies

T2 - Journal of Management Information Systems

AU - Yin,Hao Hua Sun

AU - Langenheldt,Klaus

AU - Harlev,Mikkel

AU - Mukkamala,Raghava Rao

AU - Vatrapu,Ravi

PY - 2019

Y1 - 2019

N2 - Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with ≈385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of ≈79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.

AB - Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with ≈385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of ≈79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.

KW - Cryptocurrencies

KW - Bitcoin

KW - Blockchain

KW - Cybersecurity

KW - Supervised machine learning

KW - Online anonymity

KW - Cybercrime

KW - Cryptocurrencies

KW - Bitcoin

KW - Blockchain

KW - Cybersecurity

KW - Supervised machine learning

KW - Online anonymity

KW - Cybercrime

UR - https://sfx-45cbs.hosted.exlibrisgroup.com/45cbs?url_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&ctx_enc=info:ofi/enc:UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/sfxit.com:azlist&sfx.ignore_date_threshold=1&rft.object_id=954921398118&rft.object_portfolio_id=&svc.holdings=yes&svc.fulltext=yes

U2 - 10.1080/07421222.2018.1550550

DO - 10.1080/07421222.2018.1550550

M3 - Journal article

VL - 36

SP - 37

EP - 73

JO - Journal of Management Information Systems

JF - Journal of Management Information Systems

SN - 0742-1222

IS - 1

ER -