Generative Flow Networks (GFlowNets) are a class of generative models that sample objects in proportion to a specified reward function through a learned policy. They can be trained either on-policy or off-policy, needing a balance between exploration and exploitation for fast convergence to a target distribution. While exploration strategies for discrete GFlowNets have been studied, exploration in the continuous case remains to be investigated, despite the potential for novel exploration algorithms due to the local connectedness of continuous domains. Here, we introduce Adapted Metadynamics, a variant of metadynamics that can be applied to arbitrary black-box reward functions on continuous domains. We use Adapted Metadynamics as an exploration strategy for continuous GFlowNets. We show three continuous domains where the resulting algorithm, MetaGFN, accelerates convergence to the target distribution and discovers more distant reward modes than previous off-policy exploration strategies used for GFlowNets.
翻译:生成流网络(GFlowNets)是一类生成模型,通过学习到的策略按指定奖励函数的比例对对象进行采样。它们可以通过在线策略或离线策略进行训练,需要在探索与利用之间取得平衡以实现向目标分布的快速收敛。尽管离散生成流网络的探索策略已有研究,但连续情况下的探索仍有待探究——尽管连续域的局部连通性为新颖探索算法提供了潜力。本文提出自适应元动力学,这是元动力学的一种变体,可应用于连续域上任意黑盒奖励函数。我们将自适应元动力学作为连续生成流网络的探索策略。在三个连续域上的实验表明,由此产生的算法——元生成流网络(MetaGFN)——能够加速向目标分布的收敛,并发现比先前用于生成流网络的离线策略探索策略更遥远的奖励模式。