Artificial Intelligence (AI) has been rapidly advancing and has demonstrated its ability to perform a wide range of cognitive tasks, including language processing, visual recognition, and decision-making. Part of this progress is due to LLMs (Large Language Models) like those of the GPT (Generative Pre-Trained Transformers) family. These models are capable of exhibiting behavior that can be perceived as intelligent. Most authors in Neuropsychology consider intelligent behavior to depend on a number of overarching skills, or Executive Functions (EFs), which rely on the correct functioning of neural networks in the frontal lobes, and have developed a series of tests to evaluate them. In this work, we raise the question of whether LLMs are developing executive functions similar to those of humans as part of their learning, and we evaluate the planning function and working memory of GPT using the popular Towers of Hanoi method. Additionally, we introduce a new variant of the classical method in order to avoid that the solutions are found in the LLM training data (dataleakeage). Preliminary results show that LLMs generates near-optimal solutions in Towers of Hanoi related tasks, adheres to task constraints, and exhibits rapid planning capabilities and efficient working memory usage, indicating a potential development of executive functions. However, these abilities are quite limited and worse than well-trained humans when the tasks are not known and are not part of the training data.
翻译:人工智能(AI)发展迅速,已展现出执行广泛认知任务的能力,包括语言处理、视觉识别与决策制定。这一进展部分归功于GPT(生成式预训练Transformer)系列等大型语言模型(LLM)。这些模型能够表现出可被视为智能的行为。神经心理学领域大多数学者认为,智能行为依赖于一系列统领性技能即执行功能(EF),而执行功能依赖于额叶神经网络的正確运作,并已开发系列测试对其进行评估。本研究提出:LLM在学习过程中是否正在发展类似于人类的执行功能?我们采用经典的汉诺塔方法评估GPT的规划功能和工作记忆,并引入经典方法的新变体以避免解决方案存在于LLM训练数据中(数据泄露)。初步结果显示,LLM在汉诺塔相关任务中能生成接近最优的解决方案,遵守任务约束,并展现出快速规划能力和高效工作记忆使用,表明其可能正在发展执行功能。然而,当任务不为人熟知且不属于训练数据时,这些能力相当有限,且劣于经过良好训练的人类。