Explainable AI Planning (XAIP) aims to develop AI agents that can effectively explain their decisions and actions to human users, fostering trust and facilitating human-AI collaboration. A key challenge in XAIP is model reconciliation, which seeks to align the mental models of AI agents and humans. While existing approaches often assume a known and deterministic human model, this simplification may not capture the complexities and uncertainties of real-world interactions. In this paper, we propose a novel framework that enables AI agents to learn and update a probabilistic human model through argumentation-based dialogues. Our approach incorporates trust-based and certainty-based update mechanisms, allowing the agent to refine its understanding of the human's mental state based on the human's expressed trust in the agent's arguments and certainty in their own arguments. We employ a probability weighting function inspired by prospect theory to capture the relationship between trust and perceived probability, and use a Bayesian approach to update the agent's probability distribution over possible human models. We conduct a human-subject study to empirically evaluate the effectiveness of our approach in an argumentation scenario, demonstrating its ability to capture the dynamics of human belief formation and adaptation.
翻译:可解释人工智能规划旨在开发能够向人类用户有效解释其决策与行为的智能体,以增强信任并促进人机协作。该领域的核心挑战在于模型协调,即寻求对齐智能体与人类的心理模型。现有方法通常假设已知且确定的人类模型,但这种简化可能无法捕捉真实交互的复杂性与不确定性。本文提出一种新颖框架,使智能体能够通过基于论证的对话学习并更新概率化人类模型。我们的方法融合了基于信任和基于确信度的更新机制,使智能体能够根据人类对其论证的信任度及对自身论证的确信度,动态优化对人类心理状态的理解。我们采用受前景理论启发的概率加权函数来刻画信任与感知概率之间的关系,并运用贝叶斯方法更新智能体在可能人类模型上的概率分布。通过开展人因实验,我们在论证场景中对本方法的有效性进行了实证评估,证明了其捕捉人类信念形成与适应动态过程的能力。