TY - JOUR

T1 - Interpreting Clusters via Prototype Optimization

AU - Carrizosa, Emilio

AU - Kurishchenko, Kseniia

AU - Marín, Alfredo

AU - Romero Morales, Dolores

N1 - Published online: 23 September 2021

PY - 2022/2

Y1 - 2022/2

N2 - In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.

AB - In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.

KW - Machine Learning

KW - Interpretability

KW - Cluster Analysis

KW - Prototypes

KW - Mixed-Integer Programming

KW - Machine Learning

KW - Interpretability

KW - Cluster Analysis

KW - Prototypes

KW - Mixed-Integer Programming

U2 - 10.1016/j.omega.2021.102543

DO - 10.1016/j.omega.2021.102543

M3 - Journal article

VL - 107

JO - Omega: The International Journal of Management Science

JF - Omega: The International Journal of Management Science

SN - 0305-0483

M1 - 102543

ER -