Abstract
Tree ensembles are one of the most powerful methodologies in Machine Learning. In this paper, we investigate how to make tree ensembles more flexible to incorporate explainability and fairness in the training process, possibly at the expense of a decrease in accuracy. While explainability helps the user understand the key features that play a role in the classification task, with fairness we ensure that the ensemble does not discriminate against a group of observations that share a sensitive attribute. We propose a Mixed Integer Linear Optimization formulation to train an ensemble of trees that, apart from minimizing the misclassification cost, controls for sparsity as well as the accuracy in the sensitive group. Our formulation is scalable in the number of observations since its number of binary decision variables is independent of the number of observations. In our numerical results, we show that for standard datasets used in the fairness literature, we can dramatically enhance the fairness of the benchmark, namely the popular Random Forest, while using only a few features, all without damaging the misclassification cost.
Original language | English |
---|---|
Journal | European Journal of Operational Research |
Number of pages | 32 |
ISSN | 0377-2217 |
DOIs | |
Publication status | Published - 16 Jan 2025 |
Bibliographical note
Epub ahead of print. Published online: 16. January 2025.Keywords
- (R) machine learning
- Tree ensembles
- Explainability
- Fairness
- Mixed integer linear optimization