Directed acyclic graphs represent the dependence structure among variables. When learning these graphs from data, different amounts of information may be available for different edges. Although many methods have been developed to learn the topology of these graphs, most of them do not provide a measure of uncertainty in the inference. We propose a Bayesian method, baycn (BAYesian Causal Network), to estimate the posterior probability of three states for each edge: present with one direction ($X \rightarrow Y$), present with the opposite direction ($X \leftarrow Y$), and absent. Unlike existing Bayesian methods, our method requires that the prior probabilities of these states be specified, and therefore provides a benchmark for interpreting the posterior probabilities. We develop a fast Metropolis-Hastings Markov chain Monte Carlo algorithm for the inference. Our algorithm takes as input the edges of a candidate graph, which may be the output of another graph inference method and may contain false edges. In simulation studies our method achieves high accuracy with small variation across different scenarios and is comparable or better than existing Bayesian methods. We apply baycn to genomic data to distinguish the direct and indirect targets of genetic variants.
翻译:有向无环图刻画了变量之间的依赖结构。当从数据中学习这类图结构时,不同边可获取的信息量可能存在差异。尽管已有众多方法可用于学习这些图的拓扑结构,但多数方法无法提供推理过程中的不确定性度量。本文提出一种贝叶斯方法baycn(BAYesian Causal Network)以估计每条边三种状态的后验概率:存在单向边($X \rightarrow Y$)、存在反向边($X \leftarrow Y$)及边缺失。与现有贝叶斯方法不同,本方法要求预先指定这些状态的先验概率,从而为解释后验概率提供基准。我们开发了一种快速的Metropolis-Hastings马尔可夫链蒙特卡洛算法进行推理。该算法以候选图的边集作为输入(候选图可能来自其他图推理方法且包含虚假边)。模拟研究表明,本方法在不同场景下均能实现高精度与低变异,且性能可与现有贝叶斯方法媲美或更优。我们将baycn应用于基因组数据,以区分遗传变异的直接靶标与间接靶标。