Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection. Here, in this article, we address these intriguing issues and introduce a deep Q-learning based architecture (RadDQN) that operates on a radiation-aware reward function to provide time-efficient minimum radiation-exposure pathway in a radiation zone. We propose a set of unique exploration strategies that fine-tune the extent of exploration and exploitation based on the state-wise variation in radiation exposure during training. Further, we benchmark the predicted path with grid-based deterministic method. We demonstrate that the formulated reward function in conjugation with adequate exploration strategy is effective in handling several scenarios with drastically different radiation field distributions. When compared to vanilla DQN, our model achieves a superior convergence rate and higher training stability.
翻译:近期深度强化学习技术的进步推动了其在自动化领域的多层面应用。深度强化学习可处理复杂决策问题,这促使核工业界将其应用于优化正常运行工况及潜在事故场景中人员的辐射暴露量。然而,缺乏高效的奖励函数和有效的探索策略阻碍了其在实际辐射感知自主无人机系统中的应用,难以实现最大程度的辐射防护。本文针对这些关键问题,提出了一种基于深度Q学习的架构(RadDQN),该架构采用辐射感知奖励函数,可在辐射区域内提供时间高效的最小辐射暴露路径。我们设计了一套独特的探索策略,根据训练过程中辐射暴露量的状态差异来精细调整探索与利用的平衡。进一步地,我们将预测路径与基于网格的确定性基准方法进行了对比验证。结果表明,所构建的奖励函数配合适当的探索策略,能够有效处理辐射场分布差异显著的各种场景。与原始DQN相比,我们的模型展现出更优异的收敛速度和更高的训练稳定性。