Interpreting Clusters via Prototype Optimization

Emilio Carrizosa, Kseniia Kurishchenko*, Alfredo Marín, Dolores Romero Morales

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Downloads (Pure)

Abstract

In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.
Original languageEnglish
Article number102543
JournalOmega
Volume107
Number of pages33
ISSN0305-0483
DOIs
Publication statusPublished - Feb 2022

Bibliographical note

Published online: 23 September 2021

Keywords

  • Machine Learning
  • Interpretability
  • Cluster Analysis
  • Prototypes
  • Mixed-Integer Programming

Cite this