Recent research has made significant progress in optimizing diffusion models for specific downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph diffusion presents challenges, resulting in suboptimal performance. This paper introduces graph diffusion policy optimization (GDPO), a novel approach to optimize graph diffusion models for arbitrary (e.g., non-differentiable) objectives using reinforcement learning. GDPO is based on an eager policy gradient tailored for graph diffusion models, developed through meticulous analysis and promising improved performance. Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives. Code is available at https://github.com/sail-sg/GDPO.
翻译:近期研究在针对特定下游目标优化扩散模型方面取得了显著进展,这是药物设计中的图生成等领域的重要探索方向。然而,将这些模型直接应用于图扩散面临挑战,导致性能欠佳。本文提出图扩散策略优化(GDPO),一种利用强化学习优化图扩散模型以适应任意(例如不可微)目标的新方法。GDPO基于专为图扩散模型设计的即时策略梯度,该梯度通过细致分析开发,有望提升性能。实验结果表明,GDPO在具有复杂多样目标的各类图生成任务中实现了最先进的性能。代码已开源:https://github.com/sail-sg/GDPO。