With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space, and a natural language-conditioned latent diffusion model. In particular, recognizing that multiple SMILES notations can represent the same molecule, we employ a contrastive learning strategy to extract feature space that is aware of the unique characteristics of the molecule structure. LDMol outperforms the existing baselines on the text-to-molecule generation benchmark, suggesting a potential for diffusion models can outperform autoregressive models in text data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.
翻译:随着扩散模型成为生成模型的前沿技术,许多研究者已提出基于条件扩散模型的分子生成方法。然而,分子固有的离散性使得扩散模型难以将原始数据与自然语言等高复杂度条件相连接。为此,我们提出了一种名为LDMol的新型潜扩散模型,用于文本条件分子生成。LDMol包含一个可生成可学习且具有结构信息特征空间的分子自编码器,以及一个自然语言条件的潜扩散模型。特别地,考虑到同一分子可由多种SMILES表示法描述,我们采用对比学习策略来提取能感知分子结构独特特征的特征空间。在文本到分子生成基准测试中,LDMol的表现优于现有基线模型,这表明通过更优的潜域选择,扩散模型在文本数据生成任务上可能超越自回归模型。此外,我们证明了LDMol可应用于分子到文本检索和文本引导的分子编辑等下游任务,展现了其作为扩散模型的多功能性。