Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https://github.com/violet-sto/MolHF.
翻译:分子从头设计是科学领域中一项关键且具有挑战性的任务,旨在设计具有所需性质特征的新型分子结构。通过借助面向图的生成模型已取得显著进展。然而,针对分层生成模型的关注十分有限,而此类模型能够利用分子图固有的分层结构(蕴含丰富语义信息),并生成现有大多数模型难以处理的更大尺寸复杂分子。分层生成的主要挑战在于中间离散粗化图结构的生成所导致的不可微问题。为规避这一问题,我们将离散空间上棘手的分层生成问题转化为分层表征学习的逆过程,并提出MolHF——一种新型的基于分层流的模型,能够以由粗到细的方式生成分子图。具体而言,MolHF首先通过多尺度架构生成化学键,然后基于每个尺度的粗化图结构生成原子。实验表明,MolHF在随机生成和性质优化任务中均达到当前最优性能,彰显其建模数据分布的高效能力。此外,MolHF是首个可应用于建模包含超过100个重原子的大分子(聚合物)的流模型。代码与模型已发布于https://github.com/violet-sto/MolHF。