The Vector Grounding Problem

The remarkable performance of large language models (LLMs) on complex linguistic tasks has sparked a lively debate on the nature of their capabilities. Unlike humans, these models learn language exclusively from textual data, without direct interaction with the real world. Nevertheless, they can generate seemingly meaningful text about a wide range of topics. This impressive accomplishment has rekindled interest in the classical 'Symbol Grounding Problem,' which questioned whether the internal representations and outputs of classical symbolic AI systems could possess intrinsic meaning. Unlike these systems, modern LLMs are artificial neural networks that compute over vectors rather than symbols. However, an analogous problem arises for such systems, which we dub the Vector Grounding Problem. This paper has two primary objectives. First, we differentiate various ways in which internal representations can be grounded in biological or artificial systems, identifying five distinct notions discussed in the literature: referential, sensorimotor, relational, communicative, and epistemic grounding. Unfortunately, these notions of grounding are often conflated. We clarify the differences between them, and argue that referential grounding is the one that lies at the heart of the Vector Grounding Problem. Second, drawing on theories of representational content in philosophy and cognitive science, we propose that certain LLMs, particularly those fine-tuned with Reinforcement Learning from Human Feedback (RLHF), possess the necessary features to overcome the Vector Grounding Problem, as they stand in the requisite causal-historical relations to the world that underpin intrinsic meaning. We also argue that, perhaps unexpectedly, multimodality and embodiment are neither necessary nor sufficient conditions for referential grounding in artificial systems.

翻译：大型语言模型在复杂语言任务上的卓越表现引发了一场关于其能力本质的热烈讨论。与人类不同，这些模型仅从文本数据中学习语言，无需直接与现实世界交互。然而，它们能够生成关于广泛主题的看似有意义的文本。这一令人印象深刻的成就重新点燃了人们对经典“符号接地问题”的兴趣，该问题曾质疑经典符号AI系统的内部表征和输出是否能够拥有内在意义。与现代LLMs不同，这些系统是基于向量而非符号进行运算的人工神经网络。然而，此类系统面临一个类似的问题，我们称之为“向量接地问题”。本文有两个主要目标。首先，我们区分生物或人工系统中内部表征得以接地的多种方式，识别了文献中讨论的五种不同概念：指称接地、感知运动接地、关系接地、交际接地和认知接地。遗憾的是，这些接地概念常被混淆。我们阐明了它们之间的差异，并指出指称接地是向量接地问题的核心。其次，借鉴哲学和认知科学中关于表征内容的理论，我们提出某些LLM，特别是经过人类反馈强化学习微调的模型，具备克服向量接地问题的必要特征，因为它们与世界之间建立了维持内在意义所需的因果-历史关系。我们还论证，或许出乎意料的是，多模态性和具身性既非人工系统中指称接地的必要条件，也非充分条件。