Knowledge comprehension capability is an important aspect of human intelligence. As Large Language Models (LLMs) are being envisioned as superhuman agents, it is crucial for them to be proficient at knowledge comprehension. However, existing benchmarking studies do not provide consistent, generalizable, and formal guarantees on the knowledge comprehension capabilities of LLMs. In this work, we propose the first framework to certify knowledge comprehension in LLMs with formal probabilistic guarantees. Our certificates are quantitative -- they consist of high-confidence, tight bounds on the probability that a target LLM gives the correct answer on any knowledge comprehension prompt sampled from a distribution. We design and certify novel specifications that precisely represent distributions of knowledge comprehension prompts leveraging knowledge graphs. We certify SOTA LLMs for specifications over the Wikidata5m knowledge graph. We find that the knowledge comprehension capability improves significantly with scaling the size of the models.
翻译:知识理解能力是人类智能的重要方面。随着大型语言模型(LLMs)被设想为超人类智能体,使其精通知识理解至关重要。然而,现有的基准研究未能为LLMs的知识理解能力提供一致、可泛化且形式化的保证。在本工作中,我们提出了首个通过形式化概率保证来认证LLMs知识理解的框架。我们的认证是量化的——它们包含高置信度、紧致的边界,用于描述目标LLM对从特定分布中采样的任何知识理解提示给出正确答案的概率。我们设计并认证了新颖的规范,这些规范利用知识图谱精确表示知识理解提示的分布。我们在Wikidata5m知识图谱上对最先进的LLMs进行了规范认证。研究发现,知识理解能力随着模型规模的扩大而显著提升。