Abstract
Big data research has emerged as an important discipline in information systems research and management. Yet, while the torrent of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts, research indicates there is an increasing need to develop more efficient algorithms for treating mixed data in big data. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis for performing more targeted profiling for business and research purposes using big data, so that decision makers will be able to treat mixed data, i.e. numerical and categorical data, to explain phenomena within the big data ecosystem.
Original language | English |
---|---|
Title of host publication | Intelligent Systems and Applications : Proceedings of the 2018 Intelligent Systems Conference (IntelliSys). Volume 1 |
Editors | Kohei Arai, Supriya Kapoor, Rahul Bhatia |
Number of pages | 16 |
Place of Publication | Cham |
Publisher | Springer |
Publication date | 2019 |
Pages | 1025-1040 |
ISBN (Print) | 9783030010539 |
ISBN (Electronic) | 9783030010546 |
DOIs | |
Publication status | Published - 2019 |
Event | 6th Intelligent Systems Conference. IntelliSys - London, United Kingdom Duration: 6 Sept 2018 → 7 Sept 2018 Conference number: 6 http://saiconference.com/IntelliSys |
Conference
Conference | 6th Intelligent Systems Conference. IntelliSys |
---|---|
Number | 6 |
Country/Territory | United Kingdom |
City | London |
Period | 06/09/2018 → 07/09/2018 |
Internet address |
Series | Advances in Intelligent Systems and Computing |
---|---|
Number | 868 |
ISSN | 2194-5357 |
Keywords
- Big data
- Mixed data
- Hadoop
- K-means