Learning the dependence structure among variables in complex systems is a central problem across medical, natural, and social sciences. These structures can be naturally represented by graphs, and the task of inferring such graphs from data is known as graph learning or causal discovery. Existing approaches typically rely on restrictive assumptions about the data-generating process, employ greedy oracle algorithms, or solve approximate formulations of the graph learning problem. Therefore, they are either sensitive to violations of central assumptions or fail to guarantee globally optimal solutions. We address these limitations by introducing a nonparametric graph learning framework based on conditional independence testing and integer programming. We reformulate the graph learning problem as a mixed-integer program and prove that solving this integer-programming problem provides a globally optimal solution to the original graph learning problem. Our method leverages efficient encodings of graphical separation criteria, enabling the exact recovery of larger graphs than was previously feasible. We provide an open-source R package 'glip' which supports learning (acyclic) directed (mixed) graphs and chain graphs. We demonstrate that our approach is often faster than existing exact graph learning procedures and achieves state-of-the-art performance on simulated and benchmark data across all aforementioned classes of graphs.
翻译:学习复杂系统中变量间的依赖结构是医学、自然科学和社会科学中的核心问题。这些结构可通过图自然表示,从数据中推断此类图的任务被称为图学习或因果发现。现有方法通常依赖关于数据生成过程的严格假设、使用贪婪的oracle算法或求解图学习问题的近似形式。因此,它们要么对核心假设的违反高度敏感,要么无法保证全局最优解。我们通过引入基于条件独立性检验和整数规划的非参数图学习框架来解决这些局限性。我们将图学习问题重新表述为混合整数规划,并证明求解此整数规划问题可为原始图学习问题提供全局最优解。我们的方法利用了图分离准则的高效编码,使得能够精确恢复比以往更大规模的图。我们提供开源R包"glip",支持学习(无环)有向(混合)图和链图。实验表明,我们的方法通常比现有精确图学习流程更快,并在所有前述图类上于模拟数据和基准数据中均达到最先进性能。