Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.
翻译:多目标强化学习(MORL)因其与现实世界中需要权衡多个目标的场景相似而愈发重要。为适应多样化的用户偏好,传统强化学习在MORL中面临更大的挑战。为解决MORL中从零开始训练策略的困难,我们提出了一种演示引导的多目标强化学习(DG-MORL)方法。这种新方法利用先验演示,通过角点权重支持将其与用户偏好对齐,并引入自演化机制以优化次优演示。我们的实证研究表明,DG-MORL在现有MORL算法中展现出优越性,尤其在困难条件下表现出鲁棒性和有效性。我们还提供了该算法样本复杂度的上界。