The process of identifying and characterizing B-cell epitopes, which are the portions of antigens recognized by antibodies, is important for our understanding of the immune system, and for many applications including vaccine development, therapeutics, and diagnostics. Computational epitope prediction is challenging yet rewarding as it significantly reduces the time and cost of laboratory work. Most of the existing tools do not have satisfactory performance and only discriminate epitopes from non-epitopes. This paper presents a new deep learning-based multi-task framework for linear B-cell epitope prediction as well as antibody type-specific epitope classification. Specifically, a sequenced-based neural network model using recurrent layers and Transformer blocks is developed. We propose an amino acid encoding method based on eigen decomposition to help the model learn the representations of epitopes. We introduce modifications to standard cross-entropy loss functions by extending a logit adjustment technique to cope with the class imbalance. Experimental results on data curated from the largest public epitope database demonstrate the validity of the proposed methods and the superior performance compared to competing ones.
翻译:识别和表征B细胞表位(即抗体识别的抗原区域)对于理解免疫系统以及疫苗开发、治疗和诊断等众多应用至关重要。计算表位预测虽具挑战性,但能显著降低实验室工作的时间和成本,因此具有重要价值。现有大部分工具性能不佳,且仅能区分表位与非表位。本文提出了一种基于深度学习的新型多任务框架,用于线性B细胞表位预测及抗体类型特异性表位分类。具体而言,我们开发了一种基于序列的神经网络模型,该模型采用循环层和Transformer模块。我们提出了一种基于特征分解的氨基酸编码方法,以帮助模型学习表位的表征。同时,我们通过对标准交叉熵损失函数进行改进,引入基于对数调整的技术以应对类别不平衡问题。在从最大公共表位数据库整理的数据上进行的实验结果表明,所提出方法的有效性及其相较于竞争方法的优越性能。