Sparsity in Optimal Randomized Classification Trees

Rafael Blanquero, Emilio Carrizosa, Cristina Molero-Río, Dolores Romero Morales

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.
Original languageEnglish
JournalEuropean Journal of Operational Research
Number of pages36
ISSN0377-2217
DOIs
Publication statusPublished - 16 Dec 2019

Bibliographical note

Epub ahead of print. Published online: 16. December 2019

Keywords

  • Data mining
  • Optimal classification trees
  • Global and local sparsity
  • Nonlinear Programming

Cite this

@article{47b74e598efb4636966b820e38edb470,
title = "Sparsity in Optimal Randomized Classification Trees",
abstract = "Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.",
keywords = "Data mining, Optimal classification trees, Global and local sparsity, Nonlinear Programming, Data mining, Optimal classification trees, Global and local sparsity, Nonlinear Programming",
author = "Rafael Blanquero and Emilio Carrizosa and Cristina Molero-R{\'i}o and {Romero Morales}, Dolores",
note = "Epub ahead of print. Published online: 16. December 2019",
year = "2019",
month = "12",
day = "16",
doi = "10.1016/j.ejor.2019.12.002",
language = "English",
journal = "European Journal of Operational Research",
issn = "0377-2217",
publisher = "Elsevier",

}

Sparsity in Optimal Randomized Classification Trees. / Blanquero, Rafael; Carrizosa, Emilio ; Molero-Río, Cristina; Romero Morales, Dolores .

In: European Journal of Operational Research, 16.12.2019.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Sparsity in Optimal Randomized Classification Trees

AU - Blanquero, Rafael

AU - Carrizosa, Emilio

AU - Molero-Río, Cristina

AU - Romero Morales, Dolores

N1 - Epub ahead of print. Published online: 16. December 2019

PY - 2019/12/16

Y1 - 2019/12/16

N2 - Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.

AB - Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.

KW - Data mining

KW - Optimal classification trees

KW - Global and local sparsity

KW - Nonlinear Programming

KW - Data mining

KW - Optimal classification trees

KW - Global and local sparsity

KW - Nonlinear Programming

UR - https://sfx-45cbs.hosted.exlibrisgroup.com/45cbs?url_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&ctx_enc=info:ofi/enc:UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/sfxit.com:azlist&sfx.ignore_date_threshold=1&rft.object_id=954921393016&rft.object_portfolio_id=&svc.holdings=yes&svc.fulltext=yes

U2 - 10.1016/j.ejor.2019.12.002

DO - 10.1016/j.ejor.2019.12.002

M3 - Journal article

JO - European Journal of Operational Research

JF - European Journal of Operational Research

SN - 0377-2217

ER -