This paper addresses the topic of robustness under sensing noise, ambiguous instructions, and human-robot interaction. We take a radically different tack to the issue of reliable embodied AI: instead of focusing on formal verification methods aimed at achieving model predictability and robustness, we emphasise the dynamic, ambiguous and subjective nature of human-robot interactions that requires embodied AI systems to perceive, interpret, and respond to human intentions in a manner that is consistent, comprehensible and aligned with human expectations. We argue that when embodied agents operate in human environments that are inherently social, multimodal, and fluid, reliability is contextually determined and only has meaning in relation to the goals and expectations of humans involved in the interaction. This calls for a fundamentally different approach to achieving reliable embodied AI that is centred on building and updating an accessible "explicit world model" representing the common ground between human and AI, that is used to align robot behaviours with human expectations.
翻译:本文探讨了在感知噪声、模糊指令及人机交互情境下的系统鲁棒性问题。针对可靠具身人工智能的实现路径,我们提出了一种根本性不同的研究思路:不同于传统聚焦于形式化验证方法以实现模型可预测性与鲁棒性的范式,我们强调人机交互中动态、模糊且具主观性的本质特征,这要求具身AI系统必须以符合人类预期、具备一致性与可解释性的方式感知、解读并响应人类意图。我们认为,当具身智能体运行于本质具有社会性、多模态与流动特性的人类环境中时,可靠性是由情境决定的,其意义仅存在于与交互参与者目标及预期的关联中。这要求我们采用一种以构建并持续更新"显式世界模型"为核心的根本性新路径——该模型表征人机共识基础,通过动态维护这一可访问的共享认知框架,使机器人行为与人类预期保持协同。