Synthetic lethality (SL) prediction is used to identify if the co-mutation of two genes results in cell death. The prevalent strategy is to abstract SL prediction as an edge classification task on gene nodes within SL data and achieve it through graph neural networks (GNNs). However, GNNs suffer from limitations in their message passing mechanisms, including over-smoothing and over-squashing issues. Moreover, harnessing the information of non-SL gene relationships within large-scale multi-omics data to facilitate SL prediction poses a non-trivial challenge. To tackle these issues, we propose a new multi-omics sampling-based graph transformer for SL prediction (MSGT-SL). Concretely, we introduce a shallow multi-view GNN to acquire local structural patterns from both SL and multi-omics data. Further, we input gene features that encode multi-view information into the standard self-attention to capture long-range dependencies. Notably, starting with batch genes from SL data, we adopt parallel random walk sampling across multiple omics gene graphs encompassing them. Such sampling effectively and modestly incorporates genes from omics in a structure-aware manner before using self-attention. We showcase the effectiveness of MSGT-SL on real-world SL tasks, demonstrating the empirical benefits gained from the graph transformer and multi-omics data.
翻译:合成致死性(SL)预测用于识别两个基因的共突变是否导致细胞死亡。主流策略是将SL预测抽象为SL数据中基因节点上的边分类任务,并通过图神经网络(GNNs)实现。然而,GNNs的消息传递机制存在局限性,包括过平滑和过挤压问题。此外,利用大规模多组学数据中非SL基因关系的信息来促进SL预测是一项具有挑战性的任务。为解决这些问题,我们提出了一种新的基于多组学采样的图变换器用于SL预测(MSGT-SL)。具体而言,我们引入了一个浅层多视图GNN,从SL和多组学数据中获取局部结构模式。进一步,我们将编码多视图信息的基因特征输入到标准自注意力机制中,以捕获长程依赖关系。值得注意的是,从SL数据中的批次基因开始,我们在包含它们的多个组学基因图上采用并行随机游走采样。这种采样在自注意力机制之前,以结构感知的方式有效且适度地整合了来自组学的基因。我们在真实SL任务上展示了MSGT-SL的有效性,证明了从图变换器和多组学数据中获得的实证优势。