Dynamic Backtracking in GFlowNet: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms

Generative Flow Networks (GFlowNets) are probabilistic models predicated on Markov flows, employing specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, and more. Demonstrating formidable prowess in generating high-performance biochemical molecules, GFlowNets accelerate the discovery of scientific substances, effectively circumventing the time-consuming, labor-intensive, and costly shortcomings intrinsic to conventional material discovery. However, previous work often struggles to accumulate exploratory experience and is prone to becoming disoriented within expansive sampling spaces. Attempts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel GFlowNet variant, the Dynamic Backtracking GFN (DB-GFN), which enhances the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN permits backtracking during the network construction process according to the current state's reward value, thus correcting disadvantageous decisions and exploring alternative pathways during the exploration process. Applied to generative tasks of biochemical molecules and genetic material sequences, DB-GFN surpasses existing GFlowNet models and traditional reinforcement learning methods in terms of sample quality, exploration sample quantity, and training convergence speed. Furthermore, the orthogonal nature of DB-GFN suggests its potential as a powerful tool for future improvements in GFN networks, with the promise of integrating with other strategies to achieve more efficient search performance.

翻译：生成流网络（GFlowNets）是基于马尔可夫流的概率模型，通过特定摊销算法学习随机策略，以生成包括生物分子、化学材料等在内的组合物质。GFlowNets在高效生成高性能生化分子方面展现出卓越能力，加速了科学物质的发现过程，有效规避了传统材料发现中耗时、费力且成本高昂的固有缺陷。然而，先前研究在积累探索经验方面常面临困难，且易在广阔采样空间中迷失方向。现有尝试（如LS-GFN）局限于局部贪婪搜索，缺乏更广泛的全局调整能力。本文提出一种新型GFlowNet变体——动态回溯GFN（DB-GFN），通过基于奖励的动态回溯机制增强决策步骤的自适应性。DB-GFN允许在网络构建过程中根据当前状态的奖励值进行回溯，从而在探索过程中修正不利决策并探索替代路径。在生化分子与基因序列生成任务中，DB-GFN在样本质量、探索样本数量及训练收敛速度方面均超越现有GFlowNet模型及传统强化学习方法。此外，DB-GFN的正交特性表明其有望成为未来GFN网络改进的有力工具，可与其他策略集成以实现更高效的搜索性能。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日