As Large Language Models (LLMs) perform (and sometimes excel at) more and more complex cognitive tasks, a natural question is whether AI really understands. The study of understanding in LLMs is in its infancy, and the community has yet to incorporate well-trodden research in philosophy, psychology, and education. We initiate this, specifically focusing on understanding algorithms, and propose a hierarchy of levels of understanding. We use the hierarchy to design and conduct a study with human subjects (undergraduate and graduate students) as well as large language models (generations of GPT), revealing interesting similarities and differences. We expect that our rigorous criteria will be useful to keep track of AI's progress in such cognitive domains.
翻译:随着大型语言模型(LLM)执行(有时甚至擅长)日益复杂的认知任务,一个自然的问题是:AI是否真正理解?LLM理解能力的研究尚处于起步阶段,学术界尚未充分整合哲学、心理学与教育学中成熟的研究成果。本文以此为切入点,特别聚焦于对算法的理解,提出一种理解层次的分级框架。我们运用该框架设计并开展了一项涵盖人类受试者(本科生与研究生)及大型语言模型(多代GPT)的研究,揭示了二者之间有趣的相似性与差异性。我们期望这一严谨的评估标准能为追踪AI在此类认知领域的进展提供有效工具。