Generative deep learning has become pivotal in molecular design for drug discovery, materials science, and chemical engineering. A widely used paradigm is to pretrain neural networks on string representations of molecules and fine-tune them using reinforcement learning on specific objectives. However, string-based models face challenges in ensuring chemical validity and enforcing structural constraints like the presence of specific substructures. We propose to instead combine graph-based molecular representations, which can naturally ensure chemical validity, with transformer architectures, which are highly expressive and capable of modeling long-range dependencies between atoms. Our approach iteratively modifies a molecular graph by adding atoms and bonds, which ensures chemical validity and facilitates the incorporation of structural constraints. We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned using a new training algorithm that combines elements of the deep cross-entropy method and self-improvement learning. We evaluate GraphXForm on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches. Furthermore, we apply GraphXForm to two solvent design tasks for liquid-liquid extraction, again outperforming alternative methods while flexibly enforcing structural constraints or initiating design from existing molecular structures.
翻译:生成式深度学习在药物发现、材料科学和化学工程领域的分子设计中已变得至关重要。一种广泛使用的范式是在分子的字符串表示上预训练神经网络,然后针对特定目标使用强化学习进行微调。然而,基于字符串的模型在确保化学有效性以及强制执行特定子结构存在等结构约束方面面临挑战。我们提出将基于图的分子表示(其本身能自然确保化学有效性)与变换器架构(具有高度表达能力且能建模原子间的长程依赖关系)相结合。我们的方法通过迭代地添加原子和键来修改分子图,这确保了化学有效性并便于融入结构约束。我们提出了GraphXForm,一种仅含解码器的图变换器架构,该架构先在现有化合物上进行预训练,然后使用一种结合了深度交叉熵方法与自我改进学习元素的新训练算法进行微调。我们在多种药物设计任务上评估GraphXForm,结果表明其相较于最先进的分子设计方法获得了更优的目标分数。此外,我们将GraphXForm应用于两个液-液萃取溶剂设计任务,其再次优于其他方法,同时能灵活地强制执行结构约束或从现有分子结构启动设计。