The adoption of Reinforcement Learning (RL) in several human-centred applications provides robots with autonomous decision-making capabilities and adaptability based on the observations of the operating environment. In such scenarios, however, the learning process can make robots' behaviours unclear and unpredictable to humans, thus preventing a smooth and effective Human-Robot Interaction (HRI). As a consequence, it becomes crucial to avoid robots performing actions that are unclear to the user. In this work, we investigate whether including human preferences in RL (concerning the actions the robot performs during learning) improves the transparency of a robot's behaviours. For this purpose, a shielding mechanism is included in the RL algorithm to include human preferences and to monitor the learning agent's decisions. We carried out a within-subjects study involving 26 participants to evaluate the robot's transparency in terms of Legibility, Predictability, and Expectability in different settings. Results indicate that considering human preferences during learning improves Legibility with respect to providing only Explanations, and combining human preferences with explanations elucidating the rationale behind the robot's decisions further amplifies transparency. Results also confirm that an increase in transparency leads to an increase in the safety, comfort, and reliability of the robot. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications with human in the loop.
翻译:强化学习在多人机交互场景中的应用赋予机器人基于环境观测的自主决策与自适应能力。然而在此类情境下,学习过程可能导致机器人行为对人类而言模糊难测,从而阻碍人机交互的流畅性与有效性。因此,避免机器人执行用户难以理解的行为至关重要。本研究探讨将人类对机器人学习过程中动作的偏好纳入强化学习框架,是否能够提升机器人行为的可解释性。为此,我们在强化学习算法中引入屏蔽机制,以整合人类偏好并监控学习代理的决策过程。通过包含26名参与者的受试者内实验,从可读性、可预测性与可预期性三个维度评估不同设置下机器人的透明程度。结果表明:相较于仅提供解释,学习过程中融入人类偏好可显著提升可读性;而将人类偏好与阐明机器人决策逻辑的解释相结合,可进一步增强透明性。实验数据同时证实,透明度的提升有助于增强机器人的安全性、舒适度与可靠性。这些发现揭示了学习过程透明化的重要性,并为人类参与闭环的机器人应用提供了范式参考。