Generating novel molecules with higher properties than the training space, namely the out-of-distribution generation, is important for de novo drug design. However, it is not easy for distribution learning-based models, for example diffusion models, to solve this challenge as these methods are designed to fit the distribution of training data as close as possible. In this paper, we show that Bayesian flow network, especially ChemBFN model, is capable of intrinsically generating high quality out-of-distribution samples that meet several scenarios. A reinforcement learning strategy is added to the ChemBFN and a controllable ordinary differential equation solver-like generating process is employed that accelerate the sampling processes. Most importantly, we introduce a semi-autoregressive strategy during training and inference that enhances the model performance and surpass the state-of-the-art models. A theoretical analysis of out-of-distribution generation in ChemBFN with semi-autoregressive approach is included as well.
翻译:生成具有比训练空间更高性质的新颖分子,即分布外生成,对于从头药物设计至关重要。然而,对于基于分布学习的模型(例如扩散模型)而言,解决这一挑战并不容易,因为这些方法旨在尽可能拟合训练数据的分布。本文中,我们证明贝叶斯流网络,特别是ChemBFN模型,能够本质上生成满足多种场景的高质量分布外样本。我们在ChemBFN中加入了强化学习策略,并采用了一种类似可控常微分方程求解器的生成过程,从而加速了采样流程。最重要的是,我们在训练和推理过程中引入了半自回归策略,该策略提升了模型性能并超越了现有最先进模型。文中还包含了对采用半自回归方法的ChemBFN进行分布外生成的理论分析。