Evaluating the Impact of Personalized Value Alignment in Human-Robot Interaction: Insights into Trust and Team Performance Outcomes

from arxiv, 10 pages, 9 figures, to be published in ACM/IEEE International Conference on Human Robot Interaction. arXiv admin note: text overlap with arXiv:2309.05179

This paper examines the effect of real-time, personalized alignment of a robot's reward function to the human's values on trust and team performance. We present and compare three distinct robot interaction strategies: a non-learner strategy where the robot presumes the human's reward function mirrors its own, a non-adaptive-learner strategy in which the robot learns the human's reward function for trust estimation and human behavior modeling, but still optimizes its own reward function, and an adaptive-learner strategy in which the robot learns the human's reward function and adopts it as its own. Two human-subject experiments with a total number of 54 participants were conducted. In both experiments, the human-robot team searches for potential threats in a town. The team sequentially goes through search sites to look for threats. We model the interaction between the human and the robot as a trust-aware Markov Decision Process (trust-aware MDP) and use Bayesian Inverse Reinforcement Learning (IRL) to estimate the reward weights of the human as they interact with the robot. In Experiment 1, we start our learning algorithm with an informed prior of the human's values/goals. In Experiment 2, we start the learning algorithm with an uninformed prior. Results indicate that when starting with a good informed prior, personalized value alignment does not seem to benefit trust or team performance. On the other hand, when an informed prior is unavailable, alignment to the human's values leads to high trust and higher perceived performance while maintaining the same objective team performance.

翻译：本文研究了机器人奖励函数与人类价值观的实时个性化对齐对信任和团队绩效的影响。我们提出并比较了三种不同的机器人交互策略：非学习策略（机器人假设人类的奖励函数与其自身相同）、非自适应学习策略（机器人学习人类的奖励函数以进行信任估计和人类行为建模，但仍优化自身的奖励函数）以及自适应学习策略（机器人学习人类的奖励函数并将其作为自己的奖励函数）。我们开展了两个人类受试者实验，共有54名参与者。在两个实验中，人机团队在一个城镇中搜索潜在威胁，团队依次经过搜索地点寻找威胁。我们将人与机器人之间的交互建模为信任感知马尔可夫决策过程（信任感知MDP），并使用贝叶斯逆强化学习（IRL）来估计人与机器人交互过程中的奖励权重。在实验1中，我们以人类价值观/目标的知情先验知识启动学习算法。在实验2中，我们以非知情先验知识启动学习算法。结果表明，当从良好的知情先验知识开始时，个性化价值对齐似乎并未提升信任或团队绩效。另一方面，当无法获得知情先验知识时，对齐人类价值观可带来高信任度和更高的感知绩效，同时保持相同的客观团队绩效。