Generative Flow Networks (GFlowNets) are probabilistic models predicated on Markov flows, employing specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, and more. Demonstrating formidable prowess in generating high-performance biochemical molecules, GFlowNets accelerate the discovery of scientific substances, effectively circumventing the time-consuming, labor-intensive, and costly shortcomings intrinsic to conventional material discovery. However, previous work often struggles to accumulate exploratory experience and is prone to becoming disoriented within expansive sampling spaces. Attempts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel GFlowNet variant, the Dynamic Backtracking GFN (DB-GFN), which enhances the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN permits backtracking during the network construction process according to the current state's reward value, thus correcting disadvantageous decisions and exploring alternative pathways during the exploration process. Applied to generative tasks of biochemical molecules and genetic material sequences, DB-GFN surpasses existing GFlowNet models and traditional reinforcement learning methods in terms of sample quality, exploration sample quantity, and training convergence speed. Furthermore, the orthogonal nature of DB-GFN suggests its potential as a powerful tool for future improvements in GFN networks, with the promise of integrating with other strategies to achieve more efficient search performance.
翻译:生成流网络(GFlowNets)是基于马尔可夫流的概率模型,通过特定摊销算法学习随机策略,以生成包括生物分子、化学材料等在内的组合物质。GFlowNets在高效生成高性能生化分子方面展现出卓越能力,加速了科学物质的发现过程,有效规避了传统材料发现中耗时、费力且成本高昂的固有缺陷。然而,先前研究在积累探索经验方面常面临困难,且易在广阔采样空间中迷失方向。现有尝试(如LS-GFN)局限于局部贪婪搜索,缺乏更广泛的全局调整能力。本文提出一种新型GFlowNet变体——动态回溯GFN(DB-GFN),通过基于奖励的动态回溯机制增强决策步骤的自适应性。DB-GFN允许在网络构建过程中根据当前状态的奖励值进行回溯,从而在探索过程中修正不利决策并探索替代路径。在生化分子与基因序列生成任务中,DB-GFN在样本质量、探索样本数量及训练收敛速度方面均超越现有GFlowNet模型及传统强化学习方法。此外,DB-GFN的正交特性表明其有望成为未来GFN网络改进的有力工具,可与其他策略集成以实现更高效的搜索性能。