Generative Flow Networks (GFlowNets) are probabilistic models predicated on Markov flows, employing specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, and more. Demonstrating formidable prowess in generating high-performance biochemical molecules, GFlowNets accelerate the discovery of scientific substances, effectively circumventing the time-consuming, labor-intensive, and costly shortcomings intrinsic to conventional material discovery. However, previous work often struggles to accumulate exploratory experience and is prone to becoming disoriented within expansive sampling spaces. Attempts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel GFlowNets variant, the Dynamic Backtracking GFN (DB-GFN), which enhances the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN permits backtracking during the network construction process according to the current state's reward value, thus correcting disadvantageous decisions and exploring alternative pathways during the exploration process. Applied to generative tasks of biochemical molecules and genetic material sequences, DB-GFN surpasses existing GFlowNets models and traditional reinforcement learning methods in terms of sample quality, exploration sample quantity, and training convergence speed. Furthermore, the orthogonal nature of DB-GFN suggests its potential as a powerful tool for future improvements in GFlowNets, with the promise of integrating with other strategies to achieve more efficient search performance.
翻译:生成流网络(GFlowNets)是基于马尔可夫流的概率模型,采用特定摊销算法学习随机策略,以生成包括生物分子、化学材料等在内的组合物质。GFlowNets在生成高性能生化分子方面展现出卓越能力,显著加速科学物质发现进程,有效规避传统材料发现耗时、费力且成本高昂的固有缺陷。然而,先前工作常难以积累探索经验,易在广阔采样空间中迷失方向。现有解决方案(如LS-GFN)局限于局部贪婪搜索,缺乏全局调整能力。本文提出新型GFlowNets变体——动态回溯GFN(DB-GFN),通过基于奖励的动态回溯机制增强决策步骤的自适应性。DB-GFN允许在构建网络过程中根据当前状态的奖励值进行回溯,从而修正不利决策并探索替代路径。在生化分子与基因序列生成任务中,DB-GFN在样本质量、探索样本数量及训练收敛速度方面均超越现有GFlowNets模型与强化学习方法。此外,DB-GFN的正交性表明其可作为未来GFlowNets改进的有力工具,具备与其他策略整合以实现更高效搜索性能的潜力。