TY - BOOK
T1 - Feature Reduction for Classification with Mixed Data
T2 - An Algorithmic Approach
AU - Restrepo, Marcela Galvis
PY - 2022
Y1 - 2022
N2 - This thesis consists of five chapters including the introduction. The chapters deal with feature reduction for classification with mixed data, with an application to the prediction of school dropout. The traditional way to incorporate categorical predictors in linear models is through one-hot encoding, where each category is represented by a dummy variable, this can be wasteful, difficult to interpret, and prone to overfitting, especially when dealing with high-cardinality categorical predictors.In the second chapter, co-authored with Emilio Carrizosa and Dolores Romero Morales, we propose a method to find a reduced representation of the categorical predictors by clus-tering their categories. This is done through a numerical method that aims to preserve (or even, improve) accuracy while reducing the number of coefficients to be estimated for the categorical predictors. We illustrate the performance of our approach in real-world clas-sification and count-data datasets where we see that clustering the categorical predictors reduces complexity substantially without harming accuracy
AB - This thesis consists of five chapters including the introduction. The chapters deal with feature reduction for classification with mixed data, with an application to the prediction of school dropout. The traditional way to incorporate categorical predictors in linear models is through one-hot encoding, where each category is represented by a dummy variable, this can be wasteful, difficult to interpret, and prone to overfitting, especially when dealing with high-cardinality categorical predictors.In the second chapter, co-authored with Emilio Carrizosa and Dolores Romero Morales, we propose a method to find a reduced representation of the categorical predictors by clus-tering their categories. This is done through a numerical method that aims to preserve (or even, improve) accuracy while reducing the number of coefficients to be estimated for the categorical predictors. We illustrate the performance of our approach in real-world clas-sification and count-data datasets where we see that clustering the categorical predictors reduces complexity substantially without harming accuracy
M3 - PhD thesis
SN - 9788775681235
T3 - PhD Series
BT - Feature Reduction for Classification with Mixed Data
PB - Copenhagen Business School [Phd]
CY - Frederiksberg
ER -