Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

Stephan Dlugosz, Enno Mammen, Ralf Wilke

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

51 Downloads (Pure)

Resumé

Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.
OriginalsprogEngelsk
TidsskriftComputational Statistics & Data Analysis
Vol/bind110
Sider (fra-til)145-159
ISSN0167-9473
DOI
StatusUdgivet - jun. 2017

Emneord

  • Semiparametric regression
  • Measurement error
  • Side information

Citer dette

@article{6ff6ee24fc044ead9da254276f591f38,
title = "Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions",
abstract = "Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.",
keywords = "Semiparametric regression, Measurement error, Side information, Semiparametric regression, Measurement error, Side information",
author = "Stephan Dlugosz and Enno Mammen and Ralf Wilke",
year = "2017",
month = "6",
doi = "10.1016/j.csda.2017.01.003",
language = "English",
volume = "110",
pages = "145--159",
journal = "Computational Statistics & Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",

}

Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions. / Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf.

I: Computational Statistics & Data Analysis, Bind 110, 06.2017, s. 145-159.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

AU - Dlugosz, Stephan

AU - Mammen, Enno

AU - Wilke, Ralf

PY - 2017/6

Y1 - 2017/6

N2 - Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.

AB - Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate.

KW - Semiparametric regression

KW - Measurement error

KW - Side information

KW - Semiparametric regression

KW - Measurement error

KW - Side information

UR - https://sfx-45cbs.hosted.exlibrisgroup.com/45cbs?url_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&ctx_enc=info:ofi/enc:UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/sfxit.com:azlist&sfx.ignore_date_threshold=1&rft.object_id=954926232411

U2 - 10.1016/j.csda.2017.01.003

DO - 10.1016/j.csda.2017.01.003

M3 - Journal article

VL - 110

SP - 145

EP - 159

JO - Computational Statistics & Data Analysis

JF - Computational Statistics & Data Analysis

SN - 0167-9473

ER -