Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.
翻译:生成流网络(Generative Flow Networks;GFNs)是一类基于能量的组合对象生成方法,能够生成多样且高效用的样本。然而,持续使GFNs偏向于生成高效用样本并非易事。在本工作中,我们利用GFNs与强化学习(RL)之间的联系,提出将GFN策略与动作值估计$Q$相结合,以创建可通过混合参数控制的更贪婪采样策略。我们证明,所提出的QGFN方法的多种变体能够在多种任务中提高生成高奖励样本的数量,同时不牺牲多样性。