Learning the dependence structure among variables in complex systems is a central problem across medical, natural, and social sciences. These structures can be naturally represented by graphs, and the task of inferring such graphs from data is known as graph learning or as causal discovery if the graphs are given a causal interpretation. Existing approaches typically rely on restrictive assumptions about the data-generating process, employ greedy oracle algorithms, or solve approximate formulations of the graph learning problem. As a result, they are either sensitive to violations of central assumptions or fail to guarantee globally optimal solutions. We address these limitations by introducing a nonparametric graph learning framework based on nonparametric conditional independence testing and integer programming. We reformulate the graph learning problem as an integer-programming problem and prove that solving the integer-programming problem provides a globally optimal solution to the original graph learning problem. Our method leverages efficient encodings of graphical separation criteria, enabling the exact recovery of larger graphs than was previously feasible. We provide an implementation in the openly available R package 'glip' which supports learning (acyclic) directed (mixed) graphs and chain graphs. From the resulting output one can compute representations of the corresponding Markov equivalence classes or weak equivalence classes. Empirically, we demonstrate that our approach is faster than other existing exact graph learning procedures for a large fraction of instances and graphs of various sizes. GLIP also achieves state-of-the-art performance on simulated data and benchmark datasets across all aforementioned classes of graphs.
翻译:在复杂系统中学习变量间的依赖结构是医学、自然科学和社会科学的核心问题。这些结构可以自然地用图表示,从数据中推断此类图的任务称为图学习;若对图赋予因果解释,则称为因果发现。现有方法通常依赖于对数据生成过程的限制性假设,采用贪心启发式算法,或求解图学习问题的近似形式。因此,这些方法要么对核心假设的违背敏感,要么无法保证获得全局最优解。我们通过引入一种基于非参数条件独立性检验和整数规划的非参数图学习框架来解决这些局限性。我们将图学习问题重新表述为整数规划问题,并证明求解该整数规划问题能为原始图学习问题提供全局最优解。我们的方法利用图分离准则的高效编码,实现了比以往方法更大规模图的精确恢复。我们在开源R包'glip'中提供了实现,该包支持学习(无环)有向(混合)图和链图。根据输出结果,可以计算相应马尔可夫等价类或弱等价类的表示。实证研究表明,对于大部分实例及不同规模的图,我们的方法比其他现有精确图学习程序更快。GLIP在模拟数据和基准数据集上,对上述所有图类均达到了最先进的性能。