Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates with probabilities proportional to a given reward. However, GFlowNets can only be used with a predefined scalar reward, which can be either computationally expensive or not directly accessible, in the case of multi-objective optimization (MOO) tasks for example. Moreover, to prioritize identifying high-reward candidates, the conventional practice is to raise the reward to a higher exponent, the optimal choice of which may vary across different environments. To address these issues, we propose Order-Preserving GFlowNets (OP-GFNs), which sample with probabilities in proportion to a learned reward function that is consistent with a provided (partial) order on the candidates, thus eliminating the need for an explicit formulation of the reward function. We theoretically prove that the training process of OP-GFNs gradually sparsifies the learned reward landscape in single-objective maximization tasks. The sparsification concentrates on candidates of a higher hierarchy in the ordering, ensuring exploration at the beginning and exploitation towards the end of the training. We demonstrate OP-GFN's state-of-the-art performance in single-objective maximization (totally ordered) and multi-objective Pareto front approximation (partially ordered) tasks, including synthetic datasets, molecule generation, and neural architecture search.
翻译:生成流网络(GFlowNets)被提出用于采样多样化的候选集,其采样概率与给定奖励成正比。然而,GFlowNets仅能使用预定义的标量奖励,在例如多目标优化(MOO)任务中,这类奖励要么计算成本高昂,要么无法直接获取。此外,为优先识别高奖励候选集,常规做法是将奖励提升至更高指数,但其最优选择可能因环境而异。为解决这些问题,我们提出保序生成流网络(OP-GFNs),其采样概率与学得的奖励函数成正比,该函数与候选集上给定的(偏序)关系一致,从而无需显式定义奖励函数。我们从理论上证明,OP-GFNs的训练过程会在单目标最大化任务中逐步稀疏化学得的奖励景观。该稀疏化过程聚焦于排序中层次较高的候选集,以确保训练初期进行探索、训练末期进行利用。我们在单目标最大化(全序)和多目标帕累托前沿逼近(偏序)任务中展示了OP-GFNs的最优性能,包括合成数据集、分子生成和神经架构搜索。