Current protein language models (pLMs) predominantly focus on single-chain protein sequences and often have not accounted for constraints on generative design imposed by protein-protein interactions. To address this gap, we present paired Antibody T5 (pAbT5), an encoder-decoder model to generate complementary heavy or light chain from its pairing partner. We show that our model respects conservation in framework regions and variability in hypervariable domains, demonstrated by agreement with sequence alignment and variable-length CDR loops. We also show that our model captures chain pairing preferences through the recovery of ground-truth chain type and gene families. Our results showcase the potential of pAbT5 in generative antibody design, incorporating biological constraints from chain pairing preferences.
翻译:当前蛋白质语言模型主要聚焦于单链蛋白质序列,通常未考虑蛋白质间相互作用对生成设计的约束。为弥补这一空白,我们提出了配对抗体T5模型——一种用于从配对链生成互补重链或轻链的编码器-解码器模型。实验表明,该模型能尊重框架区的保守性和超变区的变异性,这通过序列比对与可变长度CDR环的一致性得到验证。同时,模型通过恢复真实链型与基因家族偏好,成功捕获了链配对偏好性。研究结果展示了pAbT5在抗体生成设计中融入链配对偏好生物约束的潜力。