Learning graphical conditional independence structures is an important machine learning problem and a cornerstone of causal discovery. However, the accuracy and execution time of learning algorithms generally struggle to scale to problems with hundreds of highly connected variables -- for instance, recovering brain networks from fMRI data. We introduce the best order score search (BOSS) and grow-shrink trees (GSTs) for learning directed acyclic graphs (DAGs) in this paradigm. BOSS greedily searches over permutations of variables, using GSTs to construct and score DAGs from permutations. GSTs efficiently cache scores to eliminate redundant calculations. BOSS achieves state-of-the-art performance in accuracy and execution time, comparing favorably to a variety of combinatorial and gradient-based learning algorithms under a broad range of conditions. To demonstrate its practicality, we apply BOSS to two sets of resting-state fMRI data: simulated data with pseudo-empirical noise distributions derived from randomized empirical fMRI cortical signals and clinical data from 3T fMRI scans processed into cortical parcels. BOSS is available for use within the TETRAD project which includes Python and R wrappers.
翻译:学习图形条件独立结构是机器学习中的一个重要问题,也是因果发现的基石。然而,学习算法的准确性和执行时间往往难以扩展到包含数百个高度连接变量的问题——例如,从fMRI数据中恢复脑网络。我们在此范式下引入了最佳顺序评分搜索(BOSS)和生长-收缩树(GSTs),用于学习有向无环图(DAG)。BOSS通过贪婪搜索变量排列,利用GSTs从排列中构建并评分DAG。GSTs高效地缓存评分,以消除冗余计算。BOSS在准确性和执行时间上达到了最先进性能,在广泛的条件下优于多种组合型和基于梯度的学习算法。为展示其实用性,我们将BOSS应用于两组静息态fMRI数据:包含从随机化的经验fMRI皮层信号导出的伪经验噪声分布的模拟数据,以及经处理为皮层分区的3T fMRI扫描临床数据。BOSS可在TETRAD项目中使用,该项目包含Python和R接口。