Conditional independence (CI) testing is a fundamental and challenging task in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step; we refer to this class of tests as regression-based tests. Although these methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors of interest, their behavior is less understood when they fail due to misspecified inductive biases; in other words, when the employed models are not flexible enough or when the training algorithm does not induce the desired predictors. Then, we study the performance of regression-based CI tests under misspecified inductive biases. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a regression-based CI test robust against misspecified inductive biases. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.
翻译:条件独立性检验是现代统计学和机器学习中一项基础且具有挑战性的任务。许多现代条件独立性检验方法依赖强大的监督学习方法学习回归函数或贝叶斯预测器作为中间步骤;我们将此类检验称为基于回归的检验。尽管这些方法在监督学习算法能准确估计目标回归函数或贝叶斯预测器时能保证控制第一类错误,但当因错误设定的归纳偏倚导致模型失效时——即所用模型不够灵活或训练算法未能导出期望的预测器——此时这些方法的行为尚不明确。为此,我们研究了错误设定归纳偏倚下基于回归的条件独立性检验的性能。具体而言,我们提出了三种基于回归检验的误差新近似或上界,这些误差依赖于错误设定误差。此外,我们引入拉奥-布莱克韦尔预测器检验——一种对错误设定归纳偏倚具有鲁棒性的基于回归的条件独立性检验。最后,我们通过人工与真实数据实验展示了理论方法与实际应用的有效性。