Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer

Antibodies are crucial proteins produced by the immune system in response to foreign substances or antigens. The specificity of an antibody is determined by its complementarity-determining regions (CDRs), which are located in the variable domains of the antibody chains and form the antigen-binding site. Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling. Moreover, the common iterative refinement strategies lead to an inefficient inference. In this paper, we propose a \textit{simple yet effective} model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner. To achieve this, we decouple the antibody CDR design problem into two stages: (i) geometric modeling of protein complex structures and (ii) sequence-structure co-learning. We develop a novel macromolecular structure invariant embedding, typically for protein complexes, that captures both intra- and inter-component interactions among the backbone atoms, including C$\alpha$, N, C, and O atoms, to achieve comprehensive geometric modeling. Then, we introduce a simple cross-gate MLP for sequence-structure co-learning, allowing sequence and structure representations to implicitly refine each other. This enables our model to design desired sequences and structures in a one-shot manner. Extensive experiments are conducted to evaluate our results at both the sequence and structure levels, which demonstrate that our model achieves superior performance compared to the state-of-the-art antibody CDR design methods.

翻译：抗体是免疫系统针对外来物质或抗原产生的关键蛋白质。抗体的特异性由其互补决定区（CDR）决定，这些区域位于抗体链的可变结构域中，构成抗原结合位点。以往研究利用复杂技术生成CDR，但存在几何建模不充分的问题。此外，常见的迭代优化策略导致推理效率低下。本文提出一种**简单而高效**的模型，能够一次性协同设计CDR的一维序列与三维结构。为实现这一目标，我们将抗体CDR设计问题分解为两个阶段：（i）蛋白质复合物结构的几何建模和（ii）序列-结构协同学习。我们开发了一种新型大分子结构不变嵌入方法，特别针对蛋白质复合物，通过捕获主链原子（包括Cα、N、C和O）的组件内与组件间相互作用，实现全面的几何建模。随后，我们引入一种简单的跨门控MLP进行序列-结构协同学习，使序列和结构表示能够隐式地相互优化。这使我们的模型能够一次性设计所需的序列与结构。我们在序列和结构层面进行了大量实验评估，结果表明，与最先进的抗体CDR设计方法相比，我们的模型实现了更优的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日