In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state spaces. To address this issue, we replace underlying Q-tables with DQN, and propose an addition of W-Networks, as a replacement for tabular weights (W) representations. We evaluate the resulting Deep W-Networks (DWN) approach in two widely-accepted multi-objective RL benchmarks: deep sea treasure and multi-objective mountain car. We show that DWN solves the competition between multiple policies while outperforming the baseline in the form of a DQN solution. Additionally, we demonstrate that the proposed algorithm can find the Pareto front in both tested environments.
翻译:本文基于Deep Q-Networks (DQN)方法的进展,将多目标表格型强化学习算法W-learning扩展到大规模状态空间。W-learning算法能在多目标环境中自然解决多个单一策略间的竞争问题,但其表格版本难以适应大状态空间场景。为此,我们使用DQN替代底层Q值表格,并提出用W-Networks替代表格型权重(W)表示。在深海寻宝和多目标山地车这两个广泛认可的多目标强化学习基准测试中,我们对所提出的深度W网络方法进行了评估。结果表明,DWN不仅能解决多策略间的竞争问题,还在基准测试中优于基于DQN的解决方案。此外,我们验证了该算法能在两个测试环境中找到Pareto前沿。