We consider the problem of testing the equality of conditional distributions of a response variable given a vector of covariates between two populations. Such a hypothesis testing problem can be motivated from various machine learning and statistical inference scenarios, including transfer learning and causal predictive inference. We develop a nonparametric test procedure inspired from the conformal prediction framework. The construction of our test statistic combines recent developments in conformal prediction with a novel choice of conformity score, resulting in a weighted rank-sum test statistic that is valid and powerful under general settings. To our knowledge, this is the first successful attempt of using conformal prediction for testing statistical hypotheses beyond exchangeability. Our method is suitable for modern machine learning scenarios where the data has high dimensionality and large sample sizes, and can be effectively combined with existing classification algorithms to find good conformity score functions. The performance of the proposed method is demonstrated in various numerical examples.
翻译:我们考虑在两个总体间检验响应变量在给定协变量向量条件下的条件分布是否相等的问题。这一假设检验问题可源自机器学习和统计推断的多种场景,包括迁移学习和因果预测推断。我们提出了一种受共形预测框架启发的非参数检验方法。检验统计量的构建结合了共形预测领域的最新进展与一种新型一致性得分选择,从而得到一种在一般设定下既有效又具有检验力的加权秩和检验统计量。据我们所知,这是首次成功尝试将共形预测用于检验可交换性之外的统计假设。该方法适用于数据具有高维度和大规模样本的现代机器学习场景,并可有效与现有分类算法结合以寻找良好的一致性得分函数。通过多种数值示例展示了所提方法的性能。