Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

Stephan Dlugosz, Enno Mammen, Ralf Wilke

Research output: Contribution to journalJournal articleResearchpeer-review

91 Downloads (Pure)

Abstract

Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.
Original languageEnglish
JournalComputational Statistics & Data Analysis
Volume110
Pages (from-to)145-159
Number of pages15
ISSN0167-9473
DOIs
Publication statusPublished - Jun 2017

Keywords

  • Semiparametric regression
  • Measurement error
  • Side information

Cite this

@article{6ff6ee24fc044ead9da254276f591f38,
title = "Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions",
abstract = "Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.",
keywords = "Semiparametric regression, Measurement error, Side information, Semiparametric regression, Measurement error, Side information",
author = "Stephan Dlugosz and Enno Mammen and Ralf Wilke",
year = "2017",
month = "6",
doi = "10.1016/j.csda.2017.01.003",
language = "English",
volume = "110",
pages = "145--159",
journal = "Computational Statistics & Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",

}

Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions. / Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf.

In: Computational Statistics & Data Analysis, Vol. 110, 06.2017, p. 145-159.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

AU - Dlugosz, Stephan

AU - Mammen, Enno

AU - Wilke, Ralf

PY - 2017/6

Y1 - 2017/6

N2 - Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.

AB - Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.

KW - Semiparametric regression

KW - Measurement error

KW - Side information

KW - Semiparametric regression

KW - Measurement error

KW - Side information

UR - https://sfx-45cbs.hosted.exlibrisgroup.com/45cbs?url_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&ctx_enc=info:ofi/enc:UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/sfxit.com:azlist&sfx.ignore_date_threshold=1&rft.object_id=954926232411

U2 - 10.1016/j.csda.2017.01.003

DO - 10.1016/j.csda.2017.01.003

M3 - Journal article

VL - 110

SP - 145

EP - 159

JO - Computational Statistics & Data Analysis

JF - Computational Statistics & Data Analysis

SN - 0167-9473

ER -