Interpreting Clusters via Prototype Optimization

Emilio Carrizosa, Kseniia Kurishchenko*, Alfredo Marín, Dolores Romero Morales

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstrakt

In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.
OriginalsprogEngelsk
Artikelnummer102543
TidsskriftOmega
Vol/bind107
Antal sider33
ISSN0305-0483
DOI
StatusUdgivet - feb. 2022

Bibliografisk note

Published online: 23 September 2021

Emneord

  • Machine Learning
  • Interpretability
  • Cluster Analysis
  • Prototypes
  • Mixed-Integer Programming

Citationsformater