Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle to meet the desired properties compared to 1D modeling. In this work, we introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods. MolHIT is based on the Hierarchical Discrete Diffusion Model, which generalizes discrete diffusion to additional categories that encode chemical priors, and decoupled atom encoding that splits the atom types according to their chemical roles. Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion, surpassing strong 1D baselines across multiple metrics. We further demonstrate strong performance in downstream tasks, including multi-property guided generation and scaffold extension.
翻译:基于扩散模型的分子生成已成为人工智能驱动药物发现和材料科学的一个有前景的方向。尽管由于二维分子图的离散特性,图扩散模型已被广泛采用,但与一维建模相比,现有模型存在化学有效性低且难以满足所需属性的问题。在本工作中,我们提出了MolHIT,一个强大的分子图生成框架,它克服了现有方法中长期存在的性能限制。MolHIT基于分层离散扩散模型,该模型将离散扩散推广到编码化学先验的附加类别,并采用解耦原子编码,根据原子的化学作用对原子类型进行划分。总体而言,MolHIT在MOSES数据集上首次在图扩散领域实现了近乎完美的有效性,取得了新的最先进性能,在多个指标上超越了强大的一维基线模型。我们进一步展示了其在下游任务中的强大性能,包括多属性引导生成和骨架扩展。