On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification

Emilio Carrizosa, Belén Martín-Barragán, Frank Plastria, Dolores Romero Morales

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Resumé

The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.
OriginalsprogEngelsk
TidsskriftI N F O R M S Journal on Computing
Vol/bind19
Udgave nummer3
Sider (fra-til)470-479
ISSN1091-9856
DOI
StatusUdgivet - 2007
Udgivet eksterntJa

Emneord

  • Classification
  • Optimal prototype subset
  • Nearest neighbor
  • Dissimilarities
  • Integer programming
  • Variable neighborhood search
  • Missing values

Citer dette

Carrizosa, Emilio ; Martín-Barragán, Belén ; Plastria, Frank ; Romero Morales, Dolores . / On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification. I: I N F O R M S Journal on Computing. 2007 ; Bind 19, Nr. 3. s. 470-479.
@article{ca2c944341624763baa6344e2566fb78,
title = "On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification",
abstract = "The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.",
keywords = "Classification, Optimal prototype subset, Nearest neighbor, Dissimilarities, Integer programming, Variable neighborhood search, Missing values",
author = "Emilio Carrizosa and Bel{\'e}n Mart{\'i}n-Barrag{\'a}n and Frank Plastria and {Romero Morales}, Dolores",
year = "2007",
doi = "10.1287/ijoc.1060.0183",
language = "English",
volume = "19",
pages = "470--479",
journal = "I N F O R M S Journal on Computing",
issn = "1091-9856",
publisher = "Institute for Operations Research and the Management Sciences",
number = "3",

}

On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification. / Carrizosa, Emilio; Martín-Barragán, Belén; Plastria, Frank; Romero Morales, Dolores .

I: I N F O R M S Journal on Computing, Bind 19, Nr. 3, 2007, s. 470-479.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification

AU - Carrizosa, Emilio

AU - Martín-Barragán, Belén

AU - Plastria, Frank

AU - Romero Morales, Dolores

PY - 2007

Y1 - 2007

N2 - The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.

AB - The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.

KW - Classification

KW - Optimal prototype subset

KW - Nearest neighbor

KW - Dissimilarities

KW - Integer programming

KW - Variable neighborhood search

KW - Missing values

U2 - 10.1287/ijoc.1060.0183

DO - 10.1287/ijoc.1060.0183

M3 - Journal article

VL - 19

SP - 470

EP - 479

JO - I N F O R M S Journal on Computing

JF - I N F O R M S Journal on Computing

SN - 1091-9856

IS - 3

ER -