Biomedical research has revealed the crucial role of miRNAs in the progression of many diseases, and computational prediction methods are increasingly proposed for assisting biological experiments to verify miRNA-disease associations (MDAs). However, the generalizability and explainability are currently underemphasized. It's significant to generalize effective predictions to entities with fewer or no existing MDAs and reveal how the prediction scores are derived. In this study, our work contributes to data, model, and result analysis. First, for better formulation of the MDA issue, we integrate multi-source data into a heterogeneous graph with a broader learning and prediction scope, and we split massive verified MDAs into independent training, validation, and test sets as a benchmark. Second, we construct an end-to-end data-driven model that performs node feature encoding, graph structure learning, and binary prediction sequentially, with a heterogeneous graph transformer as the central module. Finally, computational experiments illustrate that our method outperforms existing state-of-the-art methods, achieving better evaluation metrics and alleviating the neglect of unknown miRNAs and diseases effectively. Case studies further demonstrate that we can make reliable MDA detections on diseases without MDA records, and the predictions can be explained in general and case by case.
翻译:生物医学研究已揭示miRNA在多种疾病进展中的关键作用,为辅助验证miRNA-疾病关联(MDA)的生物实验,计算预测方法日益被提出。然而,当前对方法的可泛化性与可解释性重视不足。将有效预测推广至缺乏或不存在已知MDA的实体,并揭示预测得分的生成机制具有重要意义。本研究在数据、模型和结果分析方面作出贡献。首先,为更好地构建MDA问题,我们将多源数据整合为具有更广学习和预测范围的异构图,并将大量经验证的MDA划分为独立的训练集、验证集和测试集作为基准。其次,我们构建了一个端到端的数据驱动模型,依次执行节点特征编码、图结构学习和二元预测,以异构图Transformer为核心模块。最后,计算实验表明,我们的方法优于现有最先进方法,获得了更好的评估指标,并有效缓解了对未知miRNA和疾病的忽视。案例研究进一步证明,我们能够对无MDA记录的疾病进行可靠的MDA检测,且预测结果可从整体和案例两个层面得到解释。