This paper is concerned with the problem of conditional independence testing for discrete data. In recent years, researchers have shed new light on this fundamental problem, emphasizing finite-sample optimality. The non-asymptotic viewpoint adapted in these works has led to novel conditional independence tests that enjoy certain optimality under various regimes. Despite their attractive theoretical properties, the considered tests are not necessarily practical, relying on a Poissonization trick and unspecified constants in their critical values. In this work, we attempt to bridge the gap between theory and practice by reproving optimality without Poissonization and calibrating tests using Monte Carlo permutations. Along the way, we also prove that classical asymptotic $\chi^2$- and $G$-tests are notably sub-optimal in a high-dimensional regime, which justifies the demand for new tools. Our theoretical results are complemented by experiments on both simulated and real-world datasets. Accompanying this paper is an R package UCI that implements the proposed tests.
翻译:本文关注离散数据中的条件独立性检验问题。近年来,研究者对这一基础问题提出新见解,重点关注有限样本最优性。这些工作所采用的非渐近视角,在多种设定下催生了具备特定最优性的新型条件独立性检验方法。尽管这些检验方法具备吸引人的理论性质,它们在实际应用中仍存在局限,具体表现为依赖泊松化技巧以及临界值中包含未指定常数。本研究试图弥合理论与实践的鸿沟:通过无需泊松化的方法重新证明最优性,并采用蒙特卡洛置换法校准检验。在此过程中,我们还证明了经典渐近χ²检验与G检验在高维场景下显著非最优,这证实了对新工具的需求。理论结果通过模拟实验与真实数据集实验得到验证。本文附带的R语言程序包UCI实现了所提出的检验方法。