Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms

Oded Koren, Carina Antonia Hallin, Nir Perel, Dror Bendet

Publikation: Kapitel i bog/rapport/konferenceprocesKonferencebidrag i proceedingsForskningpeer review

Resumé

Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.
Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.
SprogEngelsk
TitelIntelligent Systems and Applications : Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 1
RedaktørerKohei Arai, Supriya Kapoor, Rahul Bhatia
Vol/bind1
Udgivelses stedCham
ForlagSpringer
Dato2019
Sider1025-1040
ISBN (Trykt)9783030010539
ISBN (Elektronisk)9783030010546
DOI
StatusUdgivet - 2019
Begivenhed6th Intelligent Systems Conference. IntelliSys - London, Storbritannien
Varighed: 6 sep. 20187 sep. 2018
Konferencens nummer: 6
http://saiconference.com/IntelliSys

Konference

Konference6th Intelligent Systems Conference. IntelliSys
Nummer6
LandStorbritannien
ByLondon
Periode06/09/201807/09/2018
Internetadresse
NavnAdvances in Intelligent Systems and Computing
Nummer686
ISSN2194-5357

Emneord

  • Big data
  • Mixed data
  • Hadoop
  • K-means

Citer dette

Koren, O., Hallin, C. A., Perel, N., & Bendet, D. (2019). Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms. I K. Arai, S. Kapoor, & R. Bhatia (red.), Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 1 (Bind 1, s. 1025-1040). Cham: Springer. Advances in Intelligent Systems and Computing, Nr. 686, DOI: 10.1007/978-3-030-01054-6_71
Koren, Oded ; Hallin, Carina Antonia ; Perel, Nir ; Bendet, Dror. / Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms. Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 1. red. / Kohei Arai ; Supriya Kapoor ; Rahul Bhatia. Bind 1 Cham : Springer, 2019. s. 1025-1040 (Advances in Intelligent Systems and Computing; Nr. 686).
@inproceedings{69b5898564bd4de9ab0ca1ea96eb3935,
title = "Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms",
abstract = "Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.",
keywords = "Big data, Mixed data, Hadoop, K-means, Big data, Mixed data, Hadoop, K-means",
author = "Oded Koren and Hallin, {Carina Antonia} and Nir Perel and Dror Bendet",
year = "2019",
doi = "10.1007/978-3-030-01054-6_71",
language = "English",
isbn = "9783030010539",
volume = "1",
pages = "1025--1040",
editor = "Kohei Arai and Supriya Kapoor and Rahul Bhatia",
booktitle = "Intelligent Systems and Applications",
publisher = "Springer",
address = "Germany",

}

Koren, O, Hallin, CA, Perel, N & Bendet, D 2019, Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms. i K Arai, S Kapoor & R Bhatia (red), Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 1. bind 1, Springer, Cham, Advances in Intelligent Systems and Computing, nr. 686, s. 1025-1040, 6th Intelligent Systems Conference. IntelliSys, London, Storbritannien, 06/09/2018. DOI: 10.1007/978-3-030-01054-6_71

Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms. / Koren, Oded; Hallin, Carina Antonia; Perel, Nir; Bendet, Dror.

Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 1. red. / Kohei Arai; Supriya Kapoor; Rahul Bhatia. Bind 1 Cham : Springer, 2019. s. 1025-1040.

Publikation: Kapitel i bog/rapport/konferenceprocesKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms

AU - Koren,Oded

AU - Hallin,Carina Antonia

AU - Perel,Nir

AU - Bendet,Dror

PY - 2019

Y1 - 2019

N2 - Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.

AB - Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.

KW - Big data

KW - Mixed data

KW - Hadoop

KW - K-means

KW - Big data

KW - Mixed data

KW - Hadoop

KW - K-means

U2 - 10.1007/978-3-030-01054-6_71

DO - 10.1007/978-3-030-01054-6_71

M3 - Article in proceedings

SN - 9783030010539

VL - 1

SP - 1025

EP - 1040

BT - Intelligent Systems and Applications

PB - Springer

CY - Cham

ER -

Koren O, Hallin CA, Perel N, Bendet D. Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms. I Arai K, Kapoor S, Bhatia R, red., Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 1. Bind 1. Cham: Springer. 2019. s. 1025-1040. (Advances in Intelligent Systems and Computing; Nr. 686). Tilgængelig fra, DOI: 10.1007/978-3-030-01054-6_71