The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating RL principles into the realm of generative flow networks.
翻译:最近提出的生成流网络是一种通过动作序列训练策略的方法,使其能够以与给定奖励成比例的概率对组合离散对象进行采样。GFlowNets利用问题的序列特性,与强化学习形成类比。我们的工作将RL与GFlowNets之间的关联扩展到一般情况。我们论证了学习生成流网络的任务可以高效地重新定义为具有特定奖励和正则化结构的熵正则化RL问题。此外,通过将标准软RL算法应用于多个概率建模任务中的GFlowNet训练,我们阐明了这种重新表述的实际效率。与先前报告的结果相反,我们证明熵RL方法能够与已建立的GFlowNet训练方法相竞争。这一视角为将RL原理直接整合到生成流网络领域开辟了路径。