Immunogenicity prediction is a central topic in reverse vaccinology for finding candidate vaccines that can trigger protective immune responses. Existing approaches typically rely on highly compressed features and simple model architectures, leading to limited prediction accuracy and poor generalizability. To address these challenges, we introduce ProVaccine, a novel deep learning solution with a dual attention mechanism that integrates pre-trained latent vector representations of protein sequences and structures. We also compile the most comprehensive immunogenicity dataset to date, encompassing over 9,500 antigen sequences, structures, and immunogenicity labels from bacteria, viruses, and tumors. Extensive experiments demonstrate that ProVaccine outperforms existing methods across a wide range of evaluation metrics. Furthermore, we establish a post-hoc validation protocol to assess the practical significance of deep learning models in tackling vaccine design challenges. Our work provides an effective tool for vaccine design and sets valuable benchmarks for future research.
翻译:免疫原性预测是反向疫苗学中的核心课题,旨在筛选能够引发保护性免疫应答的候选疫苗。现有方法通常依赖高度压缩的特征与简单的模型架构,导致预测精度有限且泛化能力不足。为应对这些挑战,我们提出了ProVaccine——一种融合蛋白质序列与结构预训练潜在向量表征的双重注意力机制深度学习新方案。同时,我们构建了迄今为止最全面的免疫原性数据集,涵盖来自细菌、病毒及肿瘤的超过9,500条抗原序列、结构及其免疫原性标签。大量实验表明,ProVaccine在多项评估指标上均优于现有方法。此外,我们建立了事后验证流程以评估深度学习模型在应对疫苗设计挑战中的实际价值。本研究成果为疫苗设计提供了有效工具,并为未来研究确立了重要基准。