Measurement error occurs when a set of covariates influencing a response variable are corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework which is robust to mismeasured covariates, does not require the preceding assumptions, and is able to incorporate prior beliefs about the true error distribution. Our approach gives rise to two methods that are robust to measurement error via different loss functions: one based on the Total Least Squares objective and the other based on Maximum Mean Discrepancy (MMD). The latter allows for generalisation to non-Gaussian distributed errors and non-linear covariate-response relationships. We provide bounds on the generalisation error using the MMD-loss and showcase the effectiveness of the proposed framework versus prior art in real-world mental health and dietary datasets that contain significant measurement errors.
翻译:当一组影响响应变量的协变量被噪声污染时,就会产生测量误差。这可能导致误导性的推断结果,尤其是在因果效应估计等需要精确估计协变量与响应变量关系的问题中。现有的处理测量误差的方法通常依赖于强假设,例如已知误差分布或方差,以及协变量的重复测量数据。我们提出了一种贝叶斯非参数学习框架,该框架对错误测量的协变量具有稳健性,无需上述假设,且能够整合关于真实误差分布的先验信念。我们的方法通过不同的损失函数衍生出两种对测量误差稳健的方法:一种基于总最小二乘目标函数,另一种基于最大均值差异(MMD)。后者可推广至非高斯分布误差和非线性协变量-响应关系。我们给出了基于MMD损失的泛化误差界限,并在包含显著测量误差的真实心理健康和饮食数据集中展示了所提框架相较于现有方法的有效性。