Double Machine Learning and Automated Confounder Selection: A Cautionary Tale

Paul Hünermund*, Beyers Louw, Itamar Caspi

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

139 Downloads (Pure)

Abstract

Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.
OriginalsprogEngelsk
Artikelnummer20220078
TidsskriftJournal of Causal Inference
Vol/bind11
Udgave nummer1
Antal sider12
ISSN2193-3685
DOI
StatusUdgivet - maj 2023

Emneord

  • Double/debiased machine learning
  • Bad controls
  • Backdoor adjustment
  • Collider bias
  • Causal hierarchy

Citationsformater