In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting or useless feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random testing. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.
翻译:在近期一项研究中,强化学习与多目标搜索相结合的方法,在面向深度神经网络系统的在线测试中展现出优于替代技术(随机搜索与多目标搜索)的性能。该技术的实证评估基于最先进的自动驾驶系统进行。本研究是对上述实证研究的重复与扩展。我们的重复实验表明:在与原始研究相同的设置条件下(消除了因碰撞测量方式差异带来的混杂因素),强化学习并未超越纯随机测试生成。扩展研究旨在消除我们重复实验中观察到的强化学习性能欠佳的可能原因:(1)强化学习智能体接收到的奖励信号中存在矛盾或无效反馈成分;(2)采用需要将本质连续的决策空间离散化的Q学习算法。结果表明,我们改进后的强化学习智能体能够收敛至优于随机测试的有效策略。研究结果同时揭示了其他潜在改进方向,为如何最佳利用强化学习进行自动驾驶系统在线测试开辟了进一步探索空间。