GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs. They are trained with the objective of sampling such objects with probability proportional to the object's reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
翻译:GFlowNets是一类概率模型,通过学习随机策略顺序生成具有组合结构的对象(如分子图)。其训练目标是以与对象奖励成正比的概率对这些结构进行采样。在GFlowNets中,温度条件GFlowNets代表了一类以温度参数为索引的策略族,每种策略对应于相应经温度调节后的奖励函数。温度条件GFlowNets的主要优势在于通过调整温度来控制GFlowNets的探索与利用平衡。我们提出了面向温度条件GFlowNets的"学习缩放对数几率"方法(LSL-GFN),这是一种新颖的架构设计,能显著加速温度条件GFlowNets的训练。该方法基于以下洞察:先前提出的温度条件方法在深度网络训练中引入了数值挑战,因为不同温度可能导致截然不同的梯度分布和策略对数几率的理想缩放尺度。我们的研究发现,若使用温度的学得函数直接缩放策略的对数几率,可大幅降低该挑战。实验结果表明,我们的策略显著提升了GFlowNets的性能,在多个生化任务中发现多样化模态方面优于包括强化学习和采样方法在内的其他基线方法。