Reinforcement Learning (RL) has gained significant attention across various domains. However, the increasing complexity of RL programs presents testing challenges, particularly the oracle problem: defining the correctness of the RL program. Conventional human oracles struggle to cope with the complexity, leading to inefficiencies and potential unreliability in RL testing. To alleviate this problem, we propose an automated oracle approach that leverages RL properties using fuzzy logic. Our oracle quantifies an agent's behavioral compliance with reward policies and analyzes its trend over training episodes. It labels an RL program as "Buggy" if the compliance trend violates expectations derived from RL characteristics. We evaluate our oracle on RL programs with varying complexities and compare it with human oracles. Results show that while human oracles perform well in simpler testing scenarios, our fuzzy oracle demonstrates superior performance in complex environments. The proposed approach shows promise in addressing the oracle problem for RL testing, particularly in complex cases where manual testing falls short. It offers a potential solution to improve the efficiency, reliability, and scalability of RL program testing. This research takes a step towards automated testing of RL programs and highlights the potential of fuzzy logic-based oracles in tackling the oracle problem.
翻译:强化学习(Reinforcement Learning, RL)已在多个领域获得广泛关注。然而,RL程序日益增长的复杂性带来了测试挑战,尤其是预言机问题:如何定义RL程序的正确性。传统的人工预言机难以应对这种复杂性,导致RL测试效率低下且可能存在不可靠性。为缓解此问题,我们提出一种利用模糊逻辑、基于RL特性的自动化预言机方法。我们的预言机量化智能体行为对奖励策略的遵从度,并分析其在训练周期中的变化趋势。若遵从度趋势违反从RL特性推导出的预期,则该预言机将RL程序标记为“存在缺陷”。我们在不同复杂度的RL程序上评估了该预言机,并与人工预言机进行了比较。结果表明,虽然人工预言机在较简单的测试场景中表现良好,但我们的模糊预言机在复杂环境中展现出更优的性能。所提出的方法在解决RL测试的预言机问题方面显示出潜力,特别是在人工测试难以应对的复杂案例中。它为提升RL程序测试的效率、可靠性和可扩展性提供了一种潜在解决方案。本研究向RL程序的自动化测试迈进了一步,并凸显了基于模糊逻辑的预言机在应对预言机问题方面的潜力。