Measurement error (ME) and missing values in covariates are often unavoidable in disciplines that deal with data, and both problems have separately received considerable attention during the past decades. However, while most researchers are familiar with methods for treating missing data, accounting for ME in covariates of regression models is less common. In addition, ME and missing data are typically treated as two separate problems, despite practical and theoretical similarities. Here, we exploit the fact that missing data in a continuous covariate is an extreme case of classical ME, allowing us to use existing methodology that accounts for ME via a Bayesian framework that employs integrated nested Laplace approximations (INLA), and thus to simultaneously account for both ME and missing data in the same covariate. As a useful by-product, we present an approach to handle missing data in INLA, since this corresponds to the special case when no ME is present. In addition, we show how to account for Berkson ME in the same framework. In its broadest generality, the proposed joint Bayesian framework can thus account for Berkson ME, classical ME, and missing data, or for any combination of these in the same or different continuous covariates of the family of regression models that are feasible with INLA. The approach is exemplified using both simulated and real data. We provide extensive and fully reproducible Supplementary Material with thoroughly documented examples using {R-INLA} and {inlabru}.
翻译:测量误差与协变量缺失值在数据科学领域常难以避免,过去数十年间这两个问题分别得到了广泛关注。然而,尽管多数研究者熟悉缺失数据处理方法,在回归模型协变量中处理测量误差的做法仍相对少见。此外,尽管在实践与理论上具有相似性,测量误差与缺失数据通常被视为两个独立问题。本文利用连续协变量缺失数据是经典测量误差极端情形这一特性,允许我们通过采用集成嵌套拉普拉斯近似(INLA)的贝叶斯框架来处理测量误差的现有方法学,从而在同一协变量中同时处理测量误差与缺失数据。作为实用副产品,我们提出了INLA中处理缺失数据的方法——这对应无测量误差的特例。同时,我们展示了如何在相同框架中处理伯克森测量误差。在最广泛的一般性条件下,所提出的联合贝叶斯框架可处理伯克森测量误差、经典测量误差、缺失数据,以及这些误差类型在INLA可处理的回归模型族中相同或不同连续协变量上的任意组合。通过模拟数据与真实数据示例验证该方法。我们提供了详实且完全可复现的补充材料,其中包含使用{R-INLA}和{inlabru}的完整示例文档。