Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution, called the reward function. This framework learns a policy on a constructed graph, which enables sampling from an approximation of the target probability distribution through successive steps of sampling from the learned policy. To achieve this, GFlowNets can be trained with various objectives, each of which can lead to the model s ultimate goal. The aspirational strength of GFlowNets lies in their potential to discern intricate patterns within the reward function and their capacity to generalize effectively to novel, unseen parts of the reward function. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function. The experiments will focus on length generalization meaning generalization to states that can be constructed only by longer trajectories than those seen in training.
翻译:生成流网络(GFlowNets)作为一种创新的学习范式应运而生,旨在解决从未归一化的概率分布(称为奖励函数)中采样的难题。该框架通过在构建的图上学习一个策略,使得能够通过学习策略的连续采样步骤,从目标概率分布的近似中进行采样。为实现这一目标,生成流网络可采用多种目标函数进行训练,每种目标函数均可引导模型达成最终目的。生成流网络的理想优势在于其能够识别奖励函数中的复杂模式,并有效泛化至奖励函数中未见的新颖部分。本文试图在生成流网络的背景下形式化泛化概念,将泛化性与稳定性联系起来,并设计实验以评估这些模型揭示奖励函数未见部分的能力。实验将聚焦于长度泛化,即泛化至仅能通过比训练所见更长的轨迹构建的状态。