Agents are a special kind of AI-based software in that they interact in complex environments and have increased potential for emergent behaviour. Explaining such emergent behaviour is key to deploying trustworthy AI, but the increasing complexity and opaque nature of many agent implementations makes this hard. In this work, we propose a Probabilistic Graphical Model along with a pipeline for designing such model -- by which the behaviour of an agent can be deliberated about -- and for computing a robust numerical value for the intentions the agent has at any moment. We contribute measurements that evaluate the interpretability and reliability of explanations provided, and enables explainability questions such as `what do you want to do now?' (e.g. deliver soup) `how do you plan to do it?' (e.g. returning a plan that considers its skills and the world), and `why would you take this action at this state?' (e.g. explaining how that furthers or hinders its own goals). This model can be constructed by taking partial observations of the agent's actions and world states, and we provide an iterative workflow for increasing the proposed measurements through better design and/or pointing out irrational agent behaviour.
翻译:智能体作为一种特殊的人工智能软件,其特点在于能够在复杂环境中进行交互,并具有更高的涌现行为潜力。解释此类涌现行为是部署可信人工智能的关键,然而许多智能体实现日益增长的复杂性和不透明特性使得这一任务变得困难。在本工作中,我们提出了一种概率图模型及其设计流程——通过该模型可以对智能体的行为进行推演,并计算智能体在任意时刻所持意图的稳健数值。我们提出了评估所提供解释的可解释性与可靠性的度量方法,使其能够回答诸如“你现在想做什么?”(例如:递送汤品)、“你计划如何实现?”(例如:返回考虑其技能和世界状态的计划)以及“为何在此状态下采取此行动?”(例如:解释该行动如何推进或阻碍其自身目标)等可解释性问题。该模型可通过部分观测智能体行为与世界状态构建,我们提供了一种迭代工作流程,通过优化设计和/或指出智能体非理性行为来持续提升所提出的度量指标。