Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD($\lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($\lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB($\lambda$) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance.
翻译:生成流网络(GFlowNets)是一类算法族,用于在未归一化的目标密度下训练离散对象的序列采样器,并已成功应用于多种概率建模任务。现有GFlowNet训练目标要么局限于状态或转移的局部信息,要么在整个采样轨迹上传播奖励信号。我们论证,这两种方案分别代表了梯度偏差-方差权衡的两端,并提出利用该权衡以缓解其负面影响的方法。受强化学习中TD(λ)算法启发,我们引入子轨迹平衡(SubTB(λ))——一种能从可变长度的部分动作子序列中学习的GFlowNet训练目标。研究表明,SubTB(λ)在先前研究及新环境中加速了采样器收敛,并使得GFlowNet能够在更长动作序列和更稀疏奖励景观的环境中进行训练,而此前的技术无法实现。我们还对随机梯度动力学进行了比较分析,揭示了GFlowNet训练中的偏差-方差权衡以及子轨迹平衡的优势。