Recent advancements in Large Language Models (LLMs) harness linguistic associations in vast natural language data for practical applications. However, their ability to understand the physical world using only language data remains a question. After reviewing existing protocols, we explore this question using a novel and tightly controlled reasoning test (ART) and compare human norms against versions of GPT-3. Our findings highlight the categories of common-sense relations models that could learn directly from data and areas of weakness. GPT-3 offers evidence for verbal reasoning on a par with human subjects for several relations including Synonymy, Antonymy, and Default inheritance, Without reinforcement learning from human judgements, it appears GPT-3 performs at the lower end of the reference interval for Has-part and Contained-in. Weaknesses were observed also in affordance characteristics through Necessary-quality, Order-of-size and Order-of-intensity. Combining LLMs with symbolic world grounding is a promising direction to address associative learning.
翻译:近期,大语言模型(LLMs)在利用海量自然语言数据中的语言关联方面取得进展,并应用于实际场景。然而,仅凭语言数据能否理解物理世界仍存疑。在回顾现有研究框架后,我们通过一种新颖且严格控制的推理测试(ART)探究该问题,并将人类规范与GPT-3的多个版本进行对比。研究结果揭示了模型可直接从数据中习得的常识关系类别及其薄弱环节。GPT-3在同义关系、反义关系和默认继承等几种关系上的语言推理能力与人类被试相当;但在未引入基于人类判断的强化学习时,GPT-3在“组成部分关系”和“包含关系”上的表现处于参考区间下限。此外,模型在“必要属性”、“规模顺序”和“强度顺序”等可供性特征方面存在明显不足。将LLMs与符号化世界锚定相结合,是解决联想学习问题的可行方向。