Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we therefore addressed the problem of finding multiple solutions from a single task in offline RL. We propose algorithms that can learn multiple solutions in offline RL, and empirically investigate their performance. Our experimental results show that the proposed algorithm learns multiple qualitatively and quantitatively distinctive solutions in offline RL.
翻译:在线强化学习(RL)的最新研究表明,从单一任务中学习多种行为具有显著优势,例如在新环境中进行少样本适应时。尽管这种方法在离线强化学习中预期能带来类似益处,但先前研究尚未充分探讨学习多解的适用方法。因此,本研究致力于解决离线强化学习中从单一任务发现多解的问题。我们提出了能够在离线强化学习中学习多解的算法,并通过实验评估其性能。实验结果表明,所提算法能够在离线强化学习中学习到在定性和定量上均具有显著差异的多种解。