Dynamic Backtracking in GFlowNets: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms

Generative Flow Networks (GFlowNets) are probabilistic models predicated on Markov flows, employing specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, and more. Demonstrating formidable prowess in generating high-performance biochemical molecules, GFlowNets accelerate the discovery of scientific substances, effectively circumventing the time-consuming, labor-intensive, and costly shortcomings intrinsic to conventional material discovery. However, previous work often struggles to accumulate exploratory experience and is prone to becoming disoriented within expansive sampling spaces. Attempts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel GFlowNets variant, the Dynamic Backtracking GFN (DB-GFN), which enhances the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN permits backtracking during the network construction process according to the current state's reward value, thus correcting disadvantageous decisions and exploring alternative pathways during the exploration process. Applied to generative tasks of biochemical molecules and genetic material sequences, DB-GFN surpasses existing GFlowNets models and traditional reinforcement learning methods in terms of sample quality, exploration sample quantity, and training convergence speed. Furthermore, the orthogonal nature of DB-GFN suggests its potential as a powerful tool for future improvements in GFlowNets, with the promise of integrating with other strategies to achieve more efficient search performance.

翻译：生成流网络（GFlowNets）是基于马尔可夫流的概率模型，采用特定摊销算法学习随机策略，以生成包括生物分子、化学材料等在内的组合物质。GFlowNets在生成高性能生化分子方面展现出卓越能力，显著加速科学物质发现进程，有效规避传统材料发现耗时、费力且成本高昂的固有缺陷。然而，先前工作常难以积累探索经验，易在广阔采样空间中迷失方向。现有解决方案（如LS-GFN）局限于局部贪婪搜索，缺乏全局调整能力。本文提出新型GFlowNets变体——动态回溯GFN（DB-GFN），通过基于奖励的动态回溯机制增强决策步骤的自适应性。DB-GFN允许在构建网络过程中根据当前状态的奖励值进行回溯，从而修正不利决策并探索替代路径。在生化分子与基因序列生成任务中，DB-GFN在样本质量、探索样本数量及训练收敛速度方面均超越现有GFlowNets模型与强化学习方法。此外，DB-GFN的正交性表明其可作为未来GFlowNets改进的有力工具，具备与其他策略整合以实现更高效搜索性能的潜力。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日