Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty. Online RL facilitates learning from actual operational data and thereby leverages feedback only available at runtime. However, Online RL requires the definition of an effective and correct reward function, which quantifies the feedback to the RL algorithm and thereby guides learning. With Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network. For a human, it becomes practically impossible to relate the parametrization of the neural network to concrete RL decisions. Deep RL thus essentially appears as a black box, which severely limits the debugging of adaptive systems. We previously introduced the explainable RL technique XRL-DINE, which provides visual insights into why certain decisions were made at important time points. Here, we introduce an empirical user study involving 54 software engineers from academia and industry to assess (1) the performance of software engineers when performing different tasks using XRL-DINE and (2) the perceived usefulness and ease of use of XRL-DINE.
翻译:在线强化学习(Online RL)正日益被用于在设计时存在不确定性的情况下实现自适应系统。在线强化学习能够从实际运行数据中学习,从而利用仅在运行时才可获得的反馈。然而,在线强化学习需要定义有效且正确的奖励函数,该函数将反馈量化为RL算法的输入,进而指导学习过程。随着深度强化学习(Deep RL)日益受到关注,所学知识不再以显式形式表示,而是以神经网络的形式呈现。对于人类而言,将神经网络的参数化与具体的RL决策关联起来几乎变得不可能。因此,深度强化学习本质上表现为一个黑箱,这严重限制了自适应系统的调试工作。我们先前提出了可解释强化学习技术XRL-DINE,该技术提供了可视化洞察,以解释为何在关键时间点做出特定决策。本文介绍了一项包含54名来自学术界和工业界的软件工程师的实证用户研究,旨在评估:(1)软件工程师在使用XRL-DINE执行不同任务时的表现,以及(2)对XRL-DINE的感知有用性和易用性。