Strongly Agree or Strongly Disagree?: Rating Features in Support Vector Machines

Emilio Carrizosa, Amaya Nogales-Gómez, Dolores Romero Morales

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

In linear classifiers, such as the Support Vector Machine (SVM), a score is associated with each feature and objects are assigned to classes based on the linear combination of the scores and the values of the features. Inspired by discrete psychometric scales, which measure the extent to which a factor is in agreement with a statement, we propose the Discrete Level Support Vector Machine (DILSVM) where the feature scores can only take on a discrete number of values, defined by the so-called feature rating levels. The DILSVM classifier benefits from interpretability and it has visual appeal, since it can be represented as a collection of Likert scales, one for each feature, where we rate the level of agreement with the positive class. To construct the DILSVM classifier, we propose a Mixed Integer Linear Programming approach, as well as a collection of strategies to reduce computational cost. Our numerical experiments show that the 3-point and the 5-point DILSVM classifiers have comparable accuracy to the SVM with a substantial gain in interpretability and visual appeal, but also in sparsity, thanks to the appropriate choice of the feature rating levels.
In linear classifiers, such as the Support Vector Machine (SVM), a score is associated with each feature and objects are assigned to classes based on the linear combination of the scores and the values of the features. Inspired by discrete psychometric scales, which measure the extent to which a factor is in agreement with a statement, we propose the Discrete Level Support Vector Machine (DILSVM) where the feature scores can only take on a discrete number of values, defined by the so-called feature rating levels. The DILSVM classifier benefits from interpretability and it has visual appeal, since it can be represented as a collection of Likert scales, one for each feature, where we rate the level of agreement with the positive class. To construct the DILSVM classifier, we propose a Mixed Integer Linear Programming approach, as well as a collection of strategies to reduce computational cost. Our numerical experiments show that the 3-point and the 5-point DILSVM classifiers have comparable accuracy to the SVM with a substantial gain in interpretability and visual appeal, but also in sparsity, thanks to the appropriate choice of the feature rating levels.
LanguageEnglish
JournalInformation Sciences
Volume329
Pages256–273
ISSN0020-0255
DOIs
StatePublished - Feb 2016

Keywords

  • Support Vector Machines
  • Mixed Integer Linear Programming
  • Likert scale
  • Interpretability
  • Feature rating level

Cite this

@article{8b1b9406b65e4364865ffa918176c844,
title = "Strongly Agree or Strongly Disagree?: Rating Features in Support Vector Machines",
abstract = "In linear classifiers, such as the Support Vector Machine (SVM), a score is associated with each feature and objects are assigned to classes based on the linear combination of the scores and the values of the features. Inspired by discrete psychometric scales, which measure the extent to which a factor is in agreement with a statement, we propose the Discrete Level Support Vector Machine (DILSVM) where the feature scores can only take on a discrete number of values, defined by the so-called feature rating levels. The DILSVM classifier benefits from interpretability and it has visual appeal, since it can be represented as a collection of Likert scales, one for each feature, where we rate the level of agreement with the positive class. To construct the DILSVM classifier, we propose a Mixed Integer Linear Programming approach, as well as a collection of strategies to reduce computational cost. Our numerical experiments show that the 3-point and the 5-point DILSVM classifiers have comparable accuracy to the SVM with a substantial gain in interpretability and visual appeal, but also in sparsity, thanks to the appropriate choice of the feature rating levels.",
keywords = "Support vector machines, Mixed integer linear programming, Likert scale, Interpretability, Feature rating level, Support Vector Machines, Mixed Integer Linear Programming, Likert scale, Interpretability, Feature rating level",
author = "Emilio Carrizosa and Amaya Nogales-G{\'o}mez and Morales, {Dolores Romero}",
year = "2016",
month = "2",
doi = "10.1016/j.ins.2015.09.031",
language = "English",
volume = "329",
pages = "256–273",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier",

}

Strongly Agree or Strongly Disagree? Rating Features in Support Vector Machines. / Carrizosa, Emilio; Nogales-Gómez, Amaya; Morales, Dolores Romero.

In: Information Sciences, Vol. 329, 02.2016, p. 256–273.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Strongly Agree or Strongly Disagree?

T2 - Information Sciences

AU - Carrizosa,Emilio

AU - Nogales-Gómez,Amaya

AU - Morales,Dolores Romero

PY - 2016/2

Y1 - 2016/2

N2 - In linear classifiers, such as the Support Vector Machine (SVM), a score is associated with each feature and objects are assigned to classes based on the linear combination of the scores and the values of the features. Inspired by discrete psychometric scales, which measure the extent to which a factor is in agreement with a statement, we propose the Discrete Level Support Vector Machine (DILSVM) where the feature scores can only take on a discrete number of values, defined by the so-called feature rating levels. The DILSVM classifier benefits from interpretability and it has visual appeal, since it can be represented as a collection of Likert scales, one for each feature, where we rate the level of agreement with the positive class. To construct the DILSVM classifier, we propose a Mixed Integer Linear Programming approach, as well as a collection of strategies to reduce computational cost. Our numerical experiments show that the 3-point and the 5-point DILSVM classifiers have comparable accuracy to the SVM with a substantial gain in interpretability and visual appeal, but also in sparsity, thanks to the appropriate choice of the feature rating levels.

AB - In linear classifiers, such as the Support Vector Machine (SVM), a score is associated with each feature and objects are assigned to classes based on the linear combination of the scores and the values of the features. Inspired by discrete psychometric scales, which measure the extent to which a factor is in agreement with a statement, we propose the Discrete Level Support Vector Machine (DILSVM) where the feature scores can only take on a discrete number of values, defined by the so-called feature rating levels. The DILSVM classifier benefits from interpretability and it has visual appeal, since it can be represented as a collection of Likert scales, one for each feature, where we rate the level of agreement with the positive class. To construct the DILSVM classifier, we propose a Mixed Integer Linear Programming approach, as well as a collection of strategies to reduce computational cost. Our numerical experiments show that the 3-point and the 5-point DILSVM classifiers have comparable accuracy to the SVM with a substantial gain in interpretability and visual appeal, but also in sparsity, thanks to the appropriate choice of the feature rating levels.

KW - Support vector machines

KW - Mixed integer linear programming

KW - Likert scale

KW - Interpretability

KW - Feature rating level

KW - Support Vector Machines

KW - Mixed Integer Linear Programming

KW - Likert scale

KW - Interpretability

KW - Feature rating level

UR - http://sfx-45cbs.hosted.exlibrisgroup.com/45cbs?url_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&ctx_enc=info:ofi/enc:UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/sfxit.com:azlist&sfx.ignore_date_threshold=1&rft.object_id=954925406706&rft.object_portfolio_id=&svc.holdings=yes&svc.fulltext=yes

U2 - 10.1016/j.ins.2015.09.031

DO - 10.1016/j.ins.2015.09.031

M3 - Journal article

VL - 329

SP - 256

EP - 273

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -