open access publication

Article, 2017

Generalized partially linear regression with misclassified data and an application to labour market transitions

COMPUTATIONAL STATISTICS & DATA ANALYSIS, ISSN 0167-9473, 0167-9473, Volume 110, Pages 145-159, 10.1016/j.csda.2017.01.003

Contributors

Dlugosz, Stephan [1] Mammen, Enno [2] Wilke, Ralf A. (Corresponding author) [1] [3] [4] [5] [6]

Affiliations

  1. [1] ZEW Mannheim, L7 1, D-68161 Mannheim, Germany
  2. [NORA names: Germany; Europe, EU; OECD];
  3. [2] Heidelberg Univ, Inst Appl Math, Neuenheimer Feld 294, D-69120 Heidelberg, Germany
  4. [NORA names: Germany; Europe, EU; OECD];
  5. [3] Copenhagen Business Sch, Dept Econ, Porcelaenshaven 16A, DK-2000 Frederiksberg, Denmark
  6. [NORA names: CBS Copenhagen Business School; University; Denmark; Europe, EU; Nordic; OECD];
  7. [4] Univ Strasbourg, Strasbourg, France
  8. [NORA names: France; Europe, EU; OECD];
  9. [5] Univ Strasbourg, Strasbourg, France
  10. [NORA names: France; Europe, EU; OECD];

Abstract

Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate to be interacted with a nonparametric function of a continuous covariate. (C) 2017 Elsevier B.V. All rights reserved.

Keywords

Measurement error, Semiparametric regression, Side information

Data Provider: Clarivate