A Binarization Approach to Model Interactions Between Categorical Predictors in Generalized Linear Models

Emilio Carrizosa, Marcela Galvis Restrepo*, Dolores Romero Morales

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

50 Downloads (Pure)

Abstract

In this paper, our goal is to enhance the interpretability of Generalized Linear Models by identifying the most relevant interactions between categorical predictors. Searching for interaction effects can quickly become a highly combinatorial, and thus computationally costly, problem when we have many categorical predictors or even a few of them but with many categories. Moreover, the estimation of coefficients requires large training samples with enough observations for each interaction between categories. To address these bottlenecks, we propose to find a reduced representation for each categorical predictor as a binary predictor, where categories are clustered based on a dissimilarity. We provide a collection of binarized representations for each categorical predictor, where the dissimilarity takes into account information from the main effects and the interactions. The choice of the binarized predictors representing the categorical predictors is made with a novel heuristic procedure that is guided by the accuracy of the so-called binarized model. We test our methodology on both real-world and simulated data, illustrating that, without damaging the out-of-sample accuracy, our approach trains sparse models including only the most relevant interactions between categorical predictors.
OriginalsprogEngelsk
TidsskriftApplied Intelligence
Vol/bind54
Udgave nummer17-18
Sider (fra-til)7969-7981
Antal sider13
ISSN0924-669X
DOI
StatusUdgivet - sep. 2024

Bibliografisk note

Epub ahead of print: Published online: 19 June 2024.

Emneord

  • Generalized linear models
  • Interpretability
  • Categorical predictors
  • Interactions
  • Clustering of categories

Citationsformater