With the advent of large language models (LLMs), the trend in NLP has been to train LLMs on vast amounts of data to solve diverse language understanding and generation tasks. The list of LLM successes is long and varied. Nevertheless, several recent papers provide empirical evidence that LLMs fail to capture important aspects of linguistic meaning. Focusing on universal quantification, we provide a theoretical foundation for these empirical findings by proving that LLMs cannot learn certain fundamental semantic properties including semantic entailment and consistency as they are defined in formal semantics. More generally, we show that LLMs are unable to learn concepts beyond the first level of the Borel Hierarchy, which imposes severe limits on the ability of LMs, both large and small, to capture many aspects of linguistic meaning. This means that LLMs will continue to operate without formal guarantees on tasks that require entailments and deep linguistic understanding.
翻译:随着大语言模型(LLMs)的出现,自然语言处理领域的主流趋势是使用海量数据训练LLMs,以解决多样化的语言理解与生成任务。LLMs的成功案例众多且形态各异。然而,近期若干实证研究表明,LLMs未能捕捉语言意义的重要方面。本研究聚焦全称量化,通过证明LLMs无法学习形式语义学中定义的某些基本语义属性(包括语义蕴含和一致性),为上述实证发现提供了理论基础。更一般地,我们证明LLMs无法学习超越波莱尔层级第一层的概念,这对语言模型(无论规模大小)捕捉语言意义中众多特征的能力构成了严重限制。这意味着,在需要蕴含推理和深层语言理解的任务中,LLMs将继续缺乏形式化保障。