Predicting T-cell receptor (TCR)--peptide-MHC (pMHC) binding is central to vaccine design and T-cell therapy, yet deployed models frequently encounter epitopes unseen during training, causing silent overconfidence and unreliable prioritization. We address this by framing TCR--pMHC prediction as a \emph{selective prediction} problem: a calibrated model should either output a trustworthy confidence score or explicitly abstain. Concretely, we (1) introduce a dual-encoder architecture encoding both CDR3$α$/CDR3$β$ and peptide sequences via a pre-trained protein language model; (2) apply temperature scaling to correct systematic probability miscalibration; and (3) impose a conformal abstention rule that provides finite-sample coverage guarantees at a user-specified target error rate. Evaluated under three split strategies -- random, epitope-held-out, and distance-aware -- our method achieves AUROC 0.813 and ECE 0.043 under the challenging epitope-held-out protocol, reducing ECE by 69.7\% relative to an uncalibrated baseline. At 80\% coverage, the selective model further reduces error rate from 18.7\% to 10.9\%, demonstrating that calibrated abstention enables principled coverage-risk trade-offs aligned with practical screening budgets.
翻译:预测T细胞受体(TCR)与肽-MHC(pMHC)的结合是疫苗设计和T细胞治疗的核心,但已部署模型频繁遭遇训练中未见表位,导致无声过度自信和不可靠的优先级排序。我们通过将TCR–pMHC预测设定为选择性预测问题来应对这一挑战:校准模型应输出可信的置信度分数或明确弃权。具体而言,我们(1)引入双编码器架构,通过预训练蛋白质语言模型同时对CDR3α/CDR3β和肽序列进行编码;(2)应用温度缩放以校正系统性概率校准偏差;(3)施加共形弃权规则,在用户指定的目标错误率下提供有限样本覆盖保证。在三种划分策略(随机、表位留出和距离感知)下评估,我们的方法在具有挑战性的表位留出协议下实现了AUROC 0.813和ECE 0.043,相对未校准基线将ECE降低了69.7%。在80%覆盖率下,选择性模型进一步将错误率从18.7%降低至10.9%,表明校准性弃权能够实现与实际筛选预算相一致的原则性覆盖-风险权衡。