Heterogeneous molecular entities and their interactions, commonly depicted as a network, are crucial for advancing our systems-level understanding of biology. With recent advancements in high-throughput data generation and a significant improvement in computational power, graph neural networks (GNNs) have demonstrated their effectiveness in predicting biomedical interactions. Since GNNs follow a neighborhood aggregation scheme, the number of graph convolution (GC) layers (i.e., depth) determines the neighborhood orders from which they can aggregate information, thereby significantly impacting the model's performance. However, it often relies on heuristics or extensive experimentation to determine an appropriate GNN depth for a given biomedical network. These methods can be unreliable or result in expensive computational overhead. Moreover, GNNs with more GC layers tend to exhibit poor calibration, leading to high confidence in incorrect predictions. To address these challenges, we propose a Bayesian model selection framework to jointly infer the most plausible number of GC layers supported by the data, apply dropout regularization, and learn network parameters. Experiments on four biomedical interaction datasets demonstrate that our method achieves superior performance over competing methods, providing well-calibrated predictions by allowing GNNs to adapt their depths to accommodate interaction information from various biomedical networks. Source code and data is available at: https://github.com/kckishan/BBGCN-LP/tree/master
翻译:异质分子实体及其相互作用通常以网络形式呈现,这对推进我们系统层面的生物学理解至关重要。随着高通量数据生成技术的最新进展和计算能力的显著提升,图神经网络(GNNs)在预测生物医学相互作用方面已展现出卓越效能。由于GNN遵循邻域聚合机制,图卷积(GC)层的数量(即网络深度)决定了其可聚合信息的邻域阶数,从而显著影响模型性能。然而,针对特定生物医学网络确定合适的GNN深度通常依赖启发式方法或大量实验验证,这些方法可能不可靠或导致高昂的计算开销。此外,具有更多GC层的GNN往往表现出较差的校准性,导致对错误预测产生高置信度。为解决这些挑战,我们提出一种贝叶斯模型选择框架,能够联合推断数据支持的最可能GC层数、应用随机失活正则化技术并学习网络参数。在四个生物医学交互数据集上的实验表明,该方法在保持良好校准预测的同时,通过允许GNN自适应调整深度以适应不同生物医学网络的交互信息,取得了优于现有方法的性能表现。源代码与数据详见:https://github.com/kckishan/BBGCN-LP/tree/master