The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to develop computational methods to predict the distribution of conformations, which is far harder to determine experimentally than a static folded structure. We present ConforMix, an inference-time algorithm that enhances sampling of conformational distributions using a combination of classifier guidance, filtering, and free energy estimation. Our approach upgrades diffusion models -- whether trained for static structure prediction or conformational generation -- to enable more efficient discovery of conformational variability without requiring prior knowledge of major degrees of freedom. ConforMix is orthogonal to improvements in model pretraining and would benefit even a hypothetical model that perfectly reproduced the Boltzmann distribution. Remarkably, when applied to a diffusion model trained for static structure prediction, ConforMix captures structural changes including domain motion, cryptic pocket flexibility, and transporter cycling, while avoiding unphysical states. Case studies of biologically critical proteins demonstrate the scalability, accuracy, and utility of this method.
翻译:蛋白质等生物分子的功能取决于其在多种结构或"构象"间相互转换的能力。数十年来,研究人员致力于开发计算方法以预测构象分布——这比静态折叠结构的实验测定要困难得多。我们提出ConforMix算法,这是一种在推理阶段通过结合分类器引导、过滤和自由能估计来增强构象分布采样的方法。我们的方法能够升级扩散模型(无论其训练目标是静态结构预测还是构象生成),使其无需预先了解主要自由度即可更高效地发现构象变异性。ConforMix与模型预训练的改进正交,即使对于能完美复现玻尔兹曼分布的理想模型也具有增益效果。值得注意的是,当应用于为静态结构预测训练的扩散模型时,ConforMix能够捕获包括结构域运动、隐性口袋柔性和转运蛋白循环在内的结构变化,同时避免非物理状态。针对若干具有关键生物学功能的蛋白质的案例研究,证明了该方法在可扩展性、准确性和实用性方面的优势。