Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However, like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.
翻译:深度强化学习(DRL)在机器人、计算机游戏和推荐系统等多个领域已展现出显著成效。然而,与任何其他软件系统类似,基于DRL的软件系统同样存在缺陷,这些缺陷为调试与诊断带来了独特的挑战。此类缺陷常导致系统出现未伴随明确故障信息或错误提示的异常行为,使得调试过程既困难又耗时。因此,实现DRL系统的自动化监控与诊断对于减轻开发人员负担至关重要。本文提出RLExplorer——首个面向基于DRL的软件系统的故障诊断方法。RLExplorer能够自动监控训练轨迹,并依据DRL学习动态的特性运行诊断例程,以检测DRL特有缺陷的发生。随后,该方法将诊断结果以警告形式记录,内容涵盖理论概念、推荐实践方案以及针对已识别缺陷的潜在解决方案。我们通过两组实验评估RLExplorer的性能:首先对来自Stack Overflow的缺陷DRL样本进行评估,结果表明本方法能对83%的实际缺陷实现有效诊断;其次通过15位DRL专家/开发者的使用评估显示:(1)RLExplorer可识别的缺陷数量达到人工调试的3.6倍;(2)RLExplorer能够轻松集成至DRL应用程序中。