Multiple Imputation Using Gaussian Copulas

Florian Hollenbach*, Iavor Bojinov, Shahryar Minhas, Nils W. Metternich, Michael Ward, Alexander Volfovsky

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this article, we present a simple-to-use method for generating multiple imputations (MIs) using a Gaussian copula. The Gaussian copula for MI allows scholars to attain estimation results that have good coverage and small bias. The use of copulas to model the dependence among variables will enable researchers to construct valid joint distributions of the data, even without knowledge of the actual underlying marginal distributions. MIs are then generated by drawing observations from the resulting posterior joint distribution and replacing the missing values. Using simulated and observational data from published social science research, we compare imputation via Gaussian copulas with two other widely used imputation methods: multiple imputation via chained equations and Amelia II. Our results suggest that the Gaussian copula approach has a slightly smaller bias, higher coverage rates, and narrower confidence intervals compared to the other methods. This is especially true when the variables with missing data are not normally distributed. These results, combined with theoretical guarantees and ease of use, suggest that the approach examined provides an attractive alternative for applied researchers undertaking MIs.
Original languageEnglish
JournalSociological Methods & Research
Volume50
Issue number3
Pages (from-to)1259-1283
Number of pages25
ISSN0049-1241
DOIs
Publication statusPublished - Aug 2021
Externally publishedYes

Keywords

  • Missing data
  • Bayesian statistics
  • Imputation
  • Categorical data
  • Estimation

Cite this