Enhancement of the K-means Algorithm for Mixed Data in Big Data Platforms

Oded Koren, Carina Antonia Hallin, Nir Perel, Dror Bendet

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

210 Downloads (Pure)

Abstract

Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.
Original languageEnglish
Title of host publicationIntelligent Systems and Applications : Proceedings of the 2018 Intelligent Systems Conference (IntelliSys). Volume 1
EditorsKohei Arai, Supriya Kapoor, Rahul Bhatia
Number of pages16
Place of PublicationCham
PublisherSpringer
Publication date2019
Pages1025-1040
ISBN (Print)9783030010539
ISBN (Electronic)9783030010546
DOIs
Publication statusPublished - 2019
Event6th Intelligent Systems Conference. IntelliSys - London, United Kingdom
Duration: 6 Sept 20187 Sept 2018
Conference number: 6
http://saiconference.com/IntelliSys

Conference

Conference6th Intelligent Systems Conference. IntelliSys
Number6
Country/TerritoryUnited Kingdom
CityLondon
Period06/09/201807/09/2018
Internet address
SeriesAdvances in Intelligent Systems and Computing
Number868
ISSN2194-5357

Keywords

  • Big data
  • Mixed data
  • Hadoop
  • K-means

Cite this