Most exploration research on reinforcement learning (RL) has paid attention to `the way of exploration', which is `how to explore'. The other exploration research, `when to explore', has not been the main focus of RL exploration research. The issue of `when' of a monolithic exploration in the usual RL exploration behaviour binds an exploratory action to an exploitational action of an agent. Recently, a non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals. The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously. We describe the initial research of an autonomous multi-mode exploration of non-monolithic behaviour in an options framework. The higher performance of our method is shown against the existing non-monolithic exploration method through comparative experimental results.
翻译:大多数关于强化学习(RL)探索的研究关注于“探索方式”,即“如何探索”。而另一类关于“何时探索”的研究,并未成为RL探索研究的主要焦点。通常RL探索行为中单体探索的“何时”问题,将探索行为与智能体的利用行为绑定在一起。最近,出现了一种非单体探索研究,旨在模仿人类和动物的模式切换探索行为。本研究的最终目标是使智能体能够自主决定何时探索或利用。我们描述了在选项框架中实现自主多模式非单体探索的初步研究。通过对比实验结果,我们的方法在性能上优于现有的非单体探索方法。