We prove Wasserstein inverse reinforcement learning enables the learner's reward values to imitate the expert's reward values in a finite iteration for multi-objective optimizations. Moreover, we prove Wasserstein inverse reinforcement learning enables the learner's optimal solutions to imitate the expert's optimal solutions for multi-objective optimizations with lexicographic order.
翻译:我们证明了Wasserstein逆强化学习能够在有限迭代内使学习者的奖励值模仿专家的奖励值,用于多目标优化。此外,我们还证明了Wasserstein逆强化学习能够使学习者的最优解模仿专家的最优解,用于具有字典序的多目标优化。