TY - JOUR
T1 - Interpreting Clusters via Prototype Optimization
AU - Carrizosa, Emilio
AU - Kurishchenko, Kseniia
AU - Marín, Alfredo
AU - Romero Morales, Dolores
PY - 2022/2
Y1 - 2022/2
N2 - In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.
AB - In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.
KW - Machine Learning
KW - Interpretability
KW - Cluster Analysis
KW - Prototypes
KW - Mixed-Integer Programming
KW - Machine Learning
KW - Interpretability
KW - Cluster Analysis
KW - Prototypes
KW - Mixed-Integer Programming
U2 - 10.1016/j.omega.2021.102543
DO - 10.1016/j.omega.2021.102543
M3 - Journal article
SN - 0305-0483
VL - 107
JO - Omega
JF - Omega
M1 - 102543
ER -