The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.
翻译:最近提出的生成流网络(GFlowNets)是一种通过序列动作训练策略的方法,使其能够以与给定奖励成比例的概率对组合离散对象进行采样。GFlowNets利用问题的序列性质,与强化学习(RL)形成了类比。我们的工作将RL与GFlowNets之间的联系推广至一般情况,证明了学习生成流网络的任务如何能够被高效地重新定义为一个具有特定奖励和正则化器结构的熵正则化RL问题。此外,我们通过将标准软RL算法应用于多个概率建模任务中的GFlowNet训练,展示了这一重新表述的实际效率。与先前报道的结果相反,我们表明熵正则化RL方法能够与已建立的GFlowNet训练方法相竞争。这一视角为将强化学习原理直接整合至生成流网络领域开辟了一条路径。