Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.
翻译:生成流网络(GFlowNets)是一类新型概率采样器,其中智能体通过学习一种随机策略,通过一系列决策步骤生成复杂的组合结构。尽管受强化学习启发,当前GFlowNet框架的适用性相对有限,无法处理奖励函数中的随机性。在本工作中,我们采用分布式范式处理GFlowNets,将每个流函数转化为分布,从而在训练过程中提供更具信息性的学习信号。通过利用分位数函数参数化每条边流,我们提出的\textit{分位数匹配} GFlowNet学习算法能够学习风险敏感策略,这是处理风险不确定性场景的关键要素。此外,我们发现,由于训练算法的增强,即使在确定性奖励设定下,与先前方法相比,分布式方法也能在现有基准测试中实现显著改进。