GFlowNets are probabilistic models that sequentially generate compositional structures through a stochastic policy. Among GFlowNets, temperature-conditional GFlowNets can introduce temperature-based controllability for exploration and exploitation. We propose \textit{Logit-scaling GFlowNets} (Logit-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed approaches introduced numerical challenges in the deep network training, since different temperatures may give rise to very different gradient profiles as well as magnitudes of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. Also, using Logit-GFN, GFlowNets can be improved by having better generalization capabilities in offline learning and mode discovery capabilities in online learning, which is empirically verified in various biological and chemical tasks. Our code is available at \url{https://github.com/dbsxodud-11/logit-gfn}
翻译:GFlowNets是一种通过随机策略顺序生成组合结构的概率模型。在GFlowNets中,温度条件GFlowNets能够引入基于温度的可控性以平衡探索与利用。我们提出\textit{对数缩放GFlowNets}(Logit-GFN),这是一种新颖的架构设计,能极大加速温度条件GFlowNets的训练。其核心思想是:先前提出的方法在深度网络训练中引入了数值挑战,因为不同温度可能导致梯度分布及策略对数概率量级产生显著差异。我们发现,若使用温度的学习函数直接缩放策略的对数概率,该挑战将大幅减轻。此外,采用Logit-GFN可提升GFlowNets的性能,使其在离线学习中具备更好的泛化能力,在在线学习中拥有更强的模式发现能力,这在多项生物与化学任务中得到了实证验证。我们的代码发布于\url{https://github.com/dbsxodud-11/logit-gfn}