A Binarization Approach to Model Interactions Between Categorical Predictors in Generalized Linear Models

Emilio Carrizosa, Marcela Galvis Restrepo*, Dolores Romero Morales

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

2 Downloads (Pure)

Abstract

In this paper, our goal is to enhance the interpretability of Generalized Linear Models by identifying the most relevant interactions between categorical predictors. Searching for interaction effects can quickly become a highly combinatorial, and thus computationally costly, problem when we have many categorical predictors or even a few of them but with many categories. Moreover, the estimation of coefficients requires large training samples with enough observations for each interaction between categories. To address these bottlenecks, we propose to find a reduced representation for each categorical predictor as a binary predictor, where categories are clustered based on a dissimilarity. We provide a collection of binarized representations for each categorical predictor, where the dissimilarity takes into account information from the main effects and the interactions. The choice of the binarized predictors representing the categorical predictors is made with a novel heuristic procedure that is guided by the accuracy of the so-called binarized model. We test our methodology on both real-world and simulated data, illustrating that, without damaging the out-of-sample accuracy, our approach trains sparse models including only the most relevant interactions between categorical predictors.
Original languageEnglish
JournalApplied Intelligence
Number of pages13
ISSN0924-669X
DOIs
Publication statusPublished - 19 Jun 2024

Bibliographical note

Epub ahead of print: Published online: 19 June 2024.

Keywords

  • Generalized linear models
  • Interpretability
  • Categorical predictors
  • Interactions
  • Clustering of categories

Cite this