The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics. However, the high-energy barrier of the force fields can hamper the exploration of both methods by the rare event, resulting in inadequately sampled ensemble without exhaustive running. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training, which suffers from high data acquisition cost and poor generalizability. Inspired by simulated annealing, we propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property. Our method leverages an amortized denoising score matching objective trained on general crystal structures and has no reliance on simulation data during both training and inference. Experimental results across several benchmarking protein systems demonstrate that Str2Str outperforms previous state-of-the-art generative structure prediction models and can be orders of magnitude faster compared to long MD simulations. Our open-source implementation is available at https://github.com/lujiarui/Str2Str
翻译:蛋白质的动态性质对其生物学功能与特性的确定至关重要,其中蒙特卡洛(MC)和分子动力学(MD)模拟是研究此类现象的主要工具。通过利用经验导出的力场,MC或MD模拟借助马尔可夫链或牛顿力学对系统进行数值演化来探索构象空间。然而,力场的高能垒会因罕见事件阻碍这两种方法的探索,导致未经过充分运行的采样集合不充分。现有基于学习方法虽能实现直接采样,但严重依赖目标特异性模拟数据进行训练,面临数据获取成本高和泛化能力差的问题。受模拟退火启发,我们提出Str2Str——一种具有旋转平移等变性质的新型结构到结构翻译框架,可实现零样本构象采样。该方法利用在通用晶体结构上训练的摊销去噪分数匹配目标,在训练和推理过程中均不依赖模拟数据。跨多个基准蛋白质系统的实验结果表明,Str2Str的性能优于先前最先进的生成式结构预测模型,且相比长时间MD模拟可提升数个数量级的速度。我们的开源实现已发布于 https://github.com/lujiarui/Str2Str