Chaos-based reinforcement learning (CBRL) is a method in which the agent's internal chaotic dynamics drives exploration. This approach offers a model for considering how the biological brain can create variability in its behavior and learn in an exploratory manner. At the same time, it is a learning model that has the ability to automatically switch between exploration and exploitation modes and the potential to realize higher explorations that reflect what it has learned so far. However, the learning algorithms in CBRL have not been well-established in previous studies and have yet to incorporate recent advances in reinforcement learning. This study introduced Twin Delayed Deep Deterministic Policy Gradients (TD3), which is one of the state-of-the-art deep reinforcement learning algorithms that can treat deterministic and continuous action spaces, to CBRL. The validation results provide several insights. First, TD3 works as a learning algorithm for CBRL in a simple goal-reaching task. Second, CBRL agents with TD3 can autonomously suppress their exploratory behavior as learning progresses and resume exploration when the environment changes. Finally, examining the effect of the agent's chaoticity on learning shows that extremely strong chaos negatively impacts the flexible switching between exploration and exploitation.
翻译:混沌强化学习(CBRL)是一种通过智能体内部混沌动力学驱动探索的方法。该方法为理解生物大脑如何在行为中产生变异性并以探索性方式学习提供了理论框架。同时,它具备在探索与利用模式间自动切换的能力,以及实现反映已学知识的更高级探索的潜力。然而,先前研究尚未充分建立CBRL的学习算法,也未能整合强化学习领域的最新进展。本研究将双延迟深度确定性策略梯度(TD3)——一种可处理确定性和连续动作空间的最先进深度强化学习算法——引入CBRL框架。验证结果揭示了多个重要发现:第一,在简单的目标达成任务中,TD3可作为CBRL的有效学习算法;第二,采用TD3的CBRL智能体能在学习进程中自主抑制探索行为,并在环境变化时重新激活探索;第三,考察混沌程度对学习的影响表明,过强的混沌会显著削弱探索与利用之间的灵活切换能力。