Reinforcement Learning has revolutionized decision-making processes in dynamic environments, yet it often struggles with autonomously detecting and achieving goals without clear feedback signals. For example, in a Source Term Estimation problem, the lack of precise environmental information makes it challenging to provide clear feedback signals and to define and evaluate how the source's location is determined. To address this challenge, the Autonomous Goal Detection and Cessation (AGDC) module was developed, enhancing various RL algorithms by incorporating a self-feedback mechanism for autonomous goal detection and cessation upon task completion. Our method effectively identifies and ceases undefined goals by approximating the agent's belief, significantly enhancing the capabilities of RL algorithms in environments with limited feedback. To validate effectiveness of our approach, we integrated AGDC with deep Q-Network, proximal policy optimization, and deep deterministic policy gradient algorithms, and evaluated its performance on the Source Term Estimation problem. The experimental results showed that AGDC-enhanced RL algorithms significantly outperformed traditional statistical methods such as infotaxis, entrotaxis, and dual control for exploitation and exploration, as well as a non-statistical random action selection method. These improvements were evident in terms of success rate, mean traveled distance, and search time, highlighting AGDC's effectiveness and efficiency in complex, real-world scenarios.
翻译:强化学习彻底改变了动态环境中的决策过程,但在缺乏明确反馈信号的情况下,其自主检测与实现目标的能力往往受限。例如,在源项估计问题中,由于缺乏精确的环境信息,难以为智能体提供清晰的反馈信号,也难以定义和评估如何确定源的位置。为解决这一挑战,我们开发了自主目标检测与终止模块,通过引入自反馈机制来增强各类强化学习算法,使其能够自主检测目标并在任务完成后终止学习过程。该方法通过近似智能体的信念状态,有效识别并终止未明确定义的目标,显著提升了强化学习算法在反馈受限环境中的性能。为验证方法的有效性,我们将AGDC模块与深度Q网络、近端策略优化和深度确定性策略梯度算法相结合,并在源项估计问题上评估其性能。实验结果表明,增强后的强化学习算法在成功率、平均移动距离和搜索时间等指标上,显著优于传统统计方法(如信息熵策略、熵策略、开发与探索的双重控制)以及非统计的随机动作选择方法。这些改进突显了AGDC在复杂现实场景中的有效性与高效性。