Computationally generating novel synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine-learning models beyond conventional pharmacophoric methods have shown promise in generating novel small molecule compounds, but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 Main protease fragments from Diamond X-Chem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.
翻译:在药物设计中,通过计算生成具有高亲和力和低毒性的新颖可合成化合物是一项重大挑战。超越传统药效基团方法的机器学习模型在生成新颖小分子化合物方面展现出潜力,但需针对特定蛋白靶点进行大量调优。本文提出一种名为选择性迭代潜变量精炼(SILVR)的方法,无需重新训练即可对现存的基于扩散的等变生成模型进行条件约束。该模型基于片段命中结果,生成可适配蛋白结合位点的新分子。我们以COVID Moonshot项目中的Diamond X-Chem新冠病毒主蛋白酶片段作为参考数据集,对分子生成进行条件约束。SILVR速率控制条件约束程度,研究表明适中的SILVR速率可生成与原片段形状相似的新分子,即无需了解蛋白结构即可生成适配结合位点的新分子。我们还能在不影响底层生成模型生成分子质量的前提下,将多达三个片段融合为一个新分子。该方法可泛化至任何已知片段的蛋白靶点及任何基于扩散的分子生成模型。