Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework that is robust to misspecification of these assumptions and does not require replicate measurements. This approach gives rise to a general framework that is suitable for both Classical and Berkson error models via the appropriate specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it offers flexibility in the choice of loss function depending on the type of regression model. We provide bounds on the generalisation error based on the Maximum Mean Discrepancy (MMD) loss which allows for generalisation to non-Gaussian distributed errors and nonlinear covariate-response relationships. We showcase the effectiveness of the proposed framework versus prior art in real-world problems containing either Berkson or Classical measurement errors.
翻译:当影响响应变量的协变量受到噪声干扰时,即出现测量误差。这可能导致推断结果产生误导,特别是在准确估计协变量与响应变量之间关系至关重要的问题中(如因果效应估计)。现有处理测量误差的方法通常依赖于强假设,例如已知误差分布或其方差,以及可获得协变量的重复测量数据。我们提出一种贝叶斯非参数学习框架,该框架对这些假设的误设具有稳健性,且无需重复测量。该方法通过适当设定狄利克雷过程先验中心测度,形成了一个适用于经典误差模型与伯克森误差模型的通用框架。此外,根据回归模型类型的不同,该方法在损失函数选择上具有灵活性。我们基于最大均值差异损失给出了泛化误差界,该误差界可推广至非高斯分布误差及非线性协变量-响应关系。通过在包含伯克森误差或经典测量误差的实际问题中与现有技术对比,我们展示了所提框架的有效性。