Large language models show human-like performance in knowledge extraction, reasoning and dialogue, but it remains controversial whether this performance is best explained by memorization and pattern matching, or whether it reflects human-like inferential semantics and world knowledge. Knowledge bases such as WikiData provide large-scale, high-quality representations of inferential semantics and world knowledge. We show that large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in such knowledge bases. Knowledge bases model collective, institutional knowledge, and large language models seem to induce such knowledge from raw text. We show that bigger and better models exhibit more human-like concept organization, across four families of language models and three knowledge graph embeddings.
翻译:大型语言模型在知识抽取、推理和对话中展现出类人表现,但这种表现究竟源于记忆与模式匹配,还是反映了人类式的推理语义与世界知识,仍存争议。维基数据等知识库提供了大规模、高质量的推理语义与世界知识表征。我们研究证明,大型语言模型学习概念组织的方式与知识库中概念的组织方式出奇地相似。知识库建模的是集体性、制度化的知识,而大型语言模型似乎能从原始文本中归纳出此类知识。我们进一步发现,在四类语言模型与三种知识图谱嵌入方法中,规模更大、性能更强的模型展现出更具人类特征的概念组织方式。