Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

from arxiv, This is a preprint of the following chapter: Bhat et al., Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study, published in "Emerging Frontiers in Human-Robot Interaction", edited by Ramana Kumar Vinjamuri, 2024, Springer Nature reproduced with permission of Springer Nature. The final authenticated version is available online at: [INSERT LINK HERE]

With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic goal alignment within a task) between the robot and the human is gaining increasing research attention. Prior literature on value alignment makes an inherent assumption that aligning the values of the robot with that of the human benefits the team. This assumption, however, has not been empirically verified. Moreover, prior literature does not account for human's trust in the robot when analyzing human-robot value alignment. Thus, a research gap needs to be bridged by answering two questions: How does alignment of values affect trust? Is it always beneficial to align the robot's values with that of the human? We present a simulation study and a human-subject study to answer these questions. Results from the simulation study show that alignment of values is important for trust when the overall risk level of the task is high. We also present an adaptive strategy for the robot that uses Inverse Reinforcement Learning (IRL) to match the values of the robot with those of the human during interaction. Our simulations suggest that such an adaptive strategy is able to maintain trust across the full spectrum of human values. We also present results from an empirical study that validate these findings from simulation. Results indicate that real-time personalized value alignment is beneficial to trust and perceived performance by the human when the robot does not have a good prior on the human's values.

翻译：随着人工智能技术的发展，人类与机器人正越来越多地组成团队执行协作任务。为实现顺畅高效的协作，机器人与其人类伙伴之间的价值对齐（本文将其操作化为任务中动态目标对齐的程度）正获得日益增长的研究关注。现有关于价值对齐的文献隐含地假设：将机器人的价值与人类对齐有利于团队协作。然而，该假设尚未得到实证验证。此外，既往研究在分析人机价值对齐时未充分考虑人类对机器人的信任因素。因此，需要通过回答两个问题来弥合研究空白：价值对齐如何影响信任？将机器人的价值与人类对齐是否总是有益的？我们通过仿真研究与人类受试者实验来回答这些问题。仿真研究结果表明，当任务整体风险水平较高时，价值对齐对信任至关重要。我们还提出一种机器人自适应策略，该策略在交互过程中使用逆向强化学习（IRL）使机器人的价值与人类价值相匹配。仿真实验表明，这种自适应策略能够在人类全部价值谱系范围内维持信任。我们进一步通过实证研究验证了仿真结论。结果表明，当机器人未预先掌握人类价值偏好时，实时个性化价值对齐有助于提升人类对机器人的信任度与任务表现感知。