Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.
翻译:生成流网络(Generative Flow Networks, GFlowNets)是一类新型概率采样器,其中智能体通过学习随机策略,经由一系列决策步骤生成复杂组合结构。尽管其灵感源于强化学习,但现有GFlowNet框架的适用性相对有限,无法处理奖励函数中的随机性。本研究针对GFlowNets引入分布性范式,将每个流函数转化为分布形式,从而在训练过程中提供更具信息量的学习信号。通过利用分位数函数参数化各边流量,我们提出的\textit{分位数匹配}GFlowNet学习算法能够学习风险敏感策略,这是处理具有风险不确定性场景的关键要素。此外,我们发现即使面对确定性奖励设定,该分布性方法凭借增强训练算法,在现有基准测试中相比先前方法仍能取得显著性能提升。