Despite advances in embodied AI, agent reasoning systems still struggle to capture the fundamental conceptual structures that humans naturally use to understand and interact with their environment. To address this, we propose a novel framework that bridges embodied cognition theory and agent systems by leveraging a formal characterization of image schemas, which are defined as recurring patterns of sensorimotor experience that structure human cognition. By customizing LLMs to translate natural language descriptions into formal representations based on these sensorimotor patterns, we will be able to create a neurosymbolic system that grounds the agent's understanding in fundamental conceptual structures. We argue that such an approach enhances both efficiency and interpretability while enabling more intuitive human-agent interactions through shared embodied understanding.
翻译:尽管具身人工智能取得了进展,但智能体推理系统仍然难以捕捉人类自然用于理解和与环境交互的基本概念结构。为解决这一问题,我们提出了一种新颖框架,通过利用图像图式的形式化表征来桥接具身认知理论与智能体系统。图像图式被定义为构成人类认知结构的、反复出现的感知运动经验模式。通过定制大型语言模型,使其能够基于这些感知运动模式将自然语言描述转化为形式化表征,我们将能够创建一个神经符号系统,将智能体的理解锚定于基本概念结构之中。我们认为,这种方法不仅提高了效率与可解释性,还能通过共享的具身理解实现更直观的人机交互。