On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification

Emilio Carrizosa, Belén Martín-Barragán, Frank Plastria, Dolores Romero Morales

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.
The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.
LanguageEnglish
JournalI N F O R M S Journal on Computing
Volume19
Issue number3
Pages470-479
ISSN1091-9856
DOIs
StatePublished - 2007
Externally publishedYes

Keywords

    Cite this

    Carrizosa, Emilio ; Martín-Barragán, Belén ; Plastria, Frank ; Morales, Dolores Romero. / On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification. In: I N F O R M S Journal on Computing. 2007 ; Vol. 19, No. 3. pp. 470-479
    @article{ca2c944341624763baa6344e2566fb78,
    title = "On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification",
    abstract = "The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.",
    keywords = "Classification, Optimal prototype subset, Nearest neighbor, Dissimilarities, Integer programming, Variable neighborhood search, Missing values",
    author = "Emilio Carrizosa and Bel{\'e}n Mart{\'i}n-Barrag{\'a}n and Frank Plastria and Morales, {Dolores Romero}",
    year = "2007",
    doi = "10.1287/ijoc.1060.0183",
    language = "English",
    volume = "19",
    pages = "470--479",
    journal = "I N F O R M S Journal on Computing",
    issn = "1091-9856",
    publisher = "Institute for Operations Research and the Management Sciences",
    number = "3",

    }

    On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification. / Carrizosa, Emilio; Martín-Barragán, Belén; Plastria, Frank; Morales, Dolores Romero.

    In: I N F O R M S Journal on Computing, Vol. 19, No. 3, 2007, p. 470-479.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification

    AU - Carrizosa,Emilio

    AU - Martín-Barragán,Belén

    AU - Plastria,Frank

    AU - Morales,Dolores Romero

    PY - 2007

    Y1 - 2007

    N2 - The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.

    AB - The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.

    KW - Classification

    KW - Optimal prototype subset

    KW - Nearest neighbor

    KW - Dissimilarities

    KW - Integer programming

    KW - Variable neighborhood search

    KW - Missing values

    U2 - 10.1287/ijoc.1060.0183

    DO - 10.1287/ijoc.1060.0183

    M3 - Journal article

    VL - 19

    SP - 470

    EP - 479

    JO - I N F O R M S Journal on Computing

    T2 - I N F O R M S Journal on Computing

    JF - I N F O R M S Journal on Computing

    SN - 1091-9856

    IS - 3

    ER -