Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.
翻译:条件独立性检验是当代统计学与机器学习中基础且具挑战性的问题。许多现代条件独立性检验方法依赖强大的监督学习方法作为中间步骤,以学习回归函数或贝叶斯预测器。尽管这些方法在监督学习方法能准确估计回归函数或贝叶斯预测器时能保证控制第一类错误,但当其因模型误设定而失败时,其行为尚不明确。广义而言,即便使用通用逼近器(如深度神经网络),模型误设定仍可能发生。为此,我们研究基于回归的条件独立性检验在模型误设定下的性能:针对三类依赖误设定误差的回归检验,我们提出检验误差的新近似或上界;同时引入拉奥-布莱克韦尔预测检验(RBPT),一种对模型误设定具有鲁棒性的新型回归条件独立性检验方法。最后,通过人工与真实数据实验,验证了理论与方法的实用性。