T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.
翻译:T细胞受体(TCR)对肽-MHC(pMHC)复合物的识别是适应性免疫的核心环节,对疫苗设计、癌症免疫疗法和自身免疫性疾病具有重要意义。尽管机器学习的最新进展已提升了TCR-pMHC结合预测的准确性,但最有效的方法仍是无法为预测提供依据的黑盒Transformer模型。事后解释方法能够针对输入提供见解,但并未像TCR-pMHC结合那样明确建模生化机制(如已知的结合区域)。“设计即解释”模型(即训练后可直接检查其架构组件的模型)已在其他领域得到探索,但尚未用于TCR-pMHC结合预测。我们提出可解释模型层(TCR-EML),可将其整合到用于TCR-pMHC建模的蛋白质语言模型主干中。我们的方法利用从已知TCR-pMHC结合机制中提取的氨基酸残基接触原型层,从而为预测的TCR-pMHC结合提供高质量解释。在大规模数据集上对我们所提方法进行的实验表明,其具有竞争力的预测准确性和泛化能力,且在TCR-XAI基准上的评估显示,与现有方法相比,其可解释性得到提升。