Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework that is robust to mismeasured covariates, does not require the preceding assumptions, and can incorporate prior beliefs about the error distribution. This approach gives rise to a general framework that is suitable for both Classical and Berkson error models via the appropriate specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it offers flexibility in the choice of loss function depending on the type of regression model. We provide bounds on the generalization error based on the Maximum Mean Discrepancy (MMD) loss which allows for generalization to non-Gaussian distributed errors and nonlinear covariate-response relationships. We showcase the effectiveness of the proposed framework versus prior art in real-world problems containing either Berkson or Classical measurement errors.
翻译:当影响响应变量的协变量受到噪声干扰时,就会产生测量误差。这可能导致误导性的推断结果,尤其在因果效应估计等需要精确估计协变量与响应变量关系的问题中。现有处理测量误差的方法通常依赖强假设,例如已知误差分布或其方差,以及具备协变量的重复测量值。我们提出一种贝叶斯非参数学习框架,该框架对协变量测量误差具有稳健性,无需上述假设,并能融入关于误差分布的先验知识。通过适当指定狄利克雷过程(DP)的先验居中测度,该框架可同时适用于经典误差模型和伯克森误差模型。此外,它还能根据回归模型类型灵活选择损失函数。我们基于最大均值差异(MMD)损失给出了泛化误差的界,从而允许泛化至非高斯分布误差和非线性协变量-响应关系。我们通过包含伯克森或经典测量误差的实际问题,展示了所提框架相较于现有方法的有效性。