Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT, a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT has a working memory capacity limit strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working memory.
翻译:工作記憶是通用人工智能和人類智能中的關鍵環節,它充當了臨時存儲與操作信息的處理空間。本文通過系統性評估OpenAI開發的大型語言模型——ChatGPT在口頭與空間n-back任務中的表現,探究其工作記憶容量。實驗結果表明,ChatGPT的工作記憶容量限制與人類驚人地相似。我們進一步研究了不同指令策略對ChatGPT表現的影響,發現容量限制的基本模式依然存在。基於實證發現,我們提出n-back任務可作為評估大型語言模型工作記憶容量的工具,並有望為未來提升人工智能工作記憶的研究提供參考。