Feature Reduction for Classification with Mixed Data: An Algorithmic Approach

Marcela Galvis Restrepo

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandling

190 Downloads (Pure)

Abstract

This thesis consists of five chapters including the introduction. The chapters deal with feature reduction for classification with mixed data, with an application to the prediction of school dropout. The traditional way to incorporate categorical predictors in linear models is through one-hot encoding, where each category is represented by a dummy variable, this can be wasteful, difficult to interpret, and prone to overfitting, especially when dealing with high-cardinality categorical predictors.
In the second chapter, co-authored with Emilio Carrizosa and Dolores Romero Morales, we propose a method to find a reduced representation of the categorical predictors by clus-tering their categories. This is done through a numerical method that aims to preserve (or even, improve) accuracy while reducing the number of coefficients to be estimated for the categorical predictors. We illustrate the performance of our approach in real-world clas-sification and count-data datasets where we see that clustering the categorical predictors reduces complexity substantially without harming accuracy
OriginalsprogEngelsk
UdgivelsesstedFrederiksberg
ForlagCopenhagen Business School [Phd]
Antal sider143
ISBN (Trykt)9788775681235
ISBN (Elektronisk)9788775681242
StatusUdgivet - 2022
NavnPhD Series
Nummer35.2022
ISSN0906-6934

Citationsformater