While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining both the legality and quality of moves, we assess ChatGPT's understanding of the chessboard, adherence to chess rules, and strategic decision-making abilities. Our evaluation identifies limitations within ChatGPT's attention mechanism that affect its formal language comprehension and uncovers the model's underdeveloped self-regulation abilities. Our study also reveals ChatGPT's propensity for a coherent strategy in its gameplay and a noticeable uptick in decision-making assertiveness when the model is presented with a greater volume of natural language or possesses a more lucid understanding of the state of the chessboard. These findings contribute to the growing exploration of language models' abilities beyond natural language processing, providing valuable information for future research towards models demonstrating human-like cognitive abilities.
翻译:虽然大型语言模型在自然语言处理领域取得了显著进展,但它们在需要形式语言理解的复杂推理任务(如国际象棋)中的能力仍较少被探究。本文以国际象棋为例,深入探讨了OpenAI开发的先进语言模型ChatGPT在应对此类复杂推理任务时的表现。通过评估移动合法性与质量两个稳健指标,我们分析了ChatGPT对棋盘的理解、对棋规的遵循程度以及战略决策能力。我们的评估揭示了ChatGPT注意力机制中存在的局限性,这些局限性影响了其形式语言理解能力,并发现了该模型自我调节能力的不足。研究还发现,ChatGPT在游戏过程中倾向于采用连贯策略,且当模型获得更多自然语言输入或对棋盘状态拥有更清晰的理解时,其决策果断性显著提升。这些发现拓展了对语言模型超越自然语言处理能力的研究方向,为未来开发具备类人认知能力的模型提供了宝贵参考。