The advent of high-throughput sequencing technologies has lead to vast comparative genome sequences. The construction of gene-gene interaction networks or dependence graphs on the genome scale is vital for understanding the regulation of biological processes. Different dependence graphs can provide different information. Some existing methods for dependence graphs based on high-order partial correlations are sparse and not informative when there are latent variables that can explain much of the dependence in groups of genes. Other methods of dependence graphs based on correlations and first-order partial correlations might have dense graphs. When genes can be divided into groups with stronger within group dependence in gene expression than between group dependence, we present a dependence graph based on truncated vines with latent variables that makes use of group information and low-order partial correlations. The graphs are not dense, and the genes that might be more central have more neighbors in the vine dependency graph. We demonstrate the use of our dependence graph construction on two RNA-seq data sets -- yeast and prostate cancer. There is some biological evidence to support the relationship between genes in the resulting dependence graphs. A flexible framework is provided for building dependence graphs via low-order partial correlations and formation of groups, leading to graphs that are not too sparse or dense. We anticipate that this approach will help to identify groups that might be central to different biological functions.
翻译:高通量测序技术的出现导致了对大规模比较基因组序列的需求。在全基因组尺度上构建基因-基因相互作用网络或依赖性图对于理解生物过程的调控至关重要。不同的依赖性图可以提供不同信息。一些基于高阶偏相关性的现有依赖性图方法较为稀疏,并且当存在能解释基因群体中大部分依赖关系的潜变量时,这些方法信息量不足。其他基于相关性和一阶偏相关性的依赖性图方法可能产生稠密的图。当基因可以划分为组内依赖性较强而组间依赖性较弱的群体时,我们提出一种基于截断藤与潜变量的依赖性图,该图利用群体信息和低阶偏相关性。这些图不稠密,且在藤依赖性图中,可能更核心的基因拥有更多邻居。我们通过在两组RNA-seq数据(酵母和前列腺癌)上演示了依赖性图构建的应用。生物学证据支持所得依赖性图中基因之间的关系。我们提供了一个灵活框架,通过低阶偏相关性和群体形成来构建依赖性图,从而产生既不过于稀疏也不过于稠密的图。我们预期该方法将有助于识别可能在不同生物学功能中起核心作用的基因群体。