This paper introduces an innovative method for conducting conditional independence testing in high-dimensional data, facilitating the automated discovery of significant associations within distinct subgroups of a population, all while controlling the false discovery rate. This is achieved by expanding upon the model-X knockoff filter to provide more informative inferences. Our enhanced inferences can help explain sample heterogeneity and uncover interactions, making better use of the capabilities offered by modern machine learning models. Specifically, our method is able to leverage any model for the identification of data-driven hypotheses pertaining to interesting population subgroups. Then, it rigorously test these hypotheses without succumbing to selection bias. Importantly, our approach is efficient and does not require sample splitting. We demonstrate the effectiveness of our method through simulations and numerical experiments, using data derived from a randomized experiment featuring multiple treatment variables.
翻译:本文介绍了一种在高维数据中进行条件独立性检验的创新方法,该方法能够在控制错误发现率的同时,自动发现人群中不同子组内的显著关联。这一成果通过在模型-X剔除滤波器(MX knockoff filter)基础上进行扩展,以提供更具信息性的推断来实现。我们的增强型推断有助于解释样本异质性并揭示交互作用,从而更充分地利用现代机器学习模型的能力。具体而言,我们的方法能够利用任意模型识别与感兴趣人群子组相关的数据驱动假设,随后在不产生选择偏差的情况下严格检验这些假设。重要的是,我们的方法高效且无需样本分割。我们通过模拟实验和数值实验,利用来自包含多个处理变量的随机实验数据,验证了所提方法的有效性。