The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.
翻译:最近提出的生成流网络(GFlowNets)是一种通过动作序列训练策略的方法,用于以与给定奖励成正比的概率采样组合离散对象。GFlowNets利用了问题的序列性质,与强化学习(RL)具有相似性。我们的工作将RL与GFlowNet之间的连接扩展至一般情况。我们展示了如何将生成流网络的学习任务高效地重新定义为具有特定奖励和正则化器结构的熵正则化RL问题。此外,我们通过在多个概率建模任务中应用标准软RL算法进行GFlowNet训练,阐明了这种重新公式化的实际效率。与先前报道的结果相反,我们证明了熵RL方法能够与已建立的GFlowNet训练方法相竞争。这一视角为将强化学习原理直接整合到生成流网络领域开辟了一条途径。