This paper develops a model-free sequential test for conditional independence. The proposed test allows researchers to analyze an incoming i.i.d. data stream with any arbitrary dependency structure, and safely conclude whether a feature is conditionally associated with the response under study. We allow the processing of data points online, as soon as they arrive, and stop data acquisition once significant results are detected, rigorously controlling the type-I error rate. Our test can work with any sophisticated machine learning algorithm to enhance data efficiency to the extent possible. The developed method is inspired by two statistical frameworks. The first is the model-X conditional randomization test, a test for conditional independence that is valid in offline settings where the sample size is fixed in advance. The second is testing by betting, a ``game-theoretic'' approach for sequential hypothesis testing. We conduct synthetic experiments to demonstrate the advantage of our test over out-of-the-box sequential tests that account for the multiplicity of tests in the time horizon, and demonstrate the practicality of our proposal by applying it to real-world tasks.
翻译:本文提出一种无需模型假设的条件独立性序贯检验方法。该方法允许研究者分析任意依赖结构的独立同分布数据流,并能够安全地推断某一特征是否与目标响应变量存在条件关联。我们支持数据点在线到达时实时处理,并在检测到显著结果时立即停止数据采集,同时严格控制第一类错误率。该检验方法可与任何复杂机器学习算法协同工作,以最大限度提升数据利用效率。本文方法受两类统计框架启发:其一是模型-X条件随机化检验(适用于样本量预先固定的离线场景的条件独立性检验),其二是基于投注的检验(一种“博弈论”视角的序贯假设检验方法)。通过合成实验,我们证明了该方法相较于需处理时间维度多重检验问题的现有序贯检验方法具有显著优势,并通过实际任务验证了其应用可行性。