Theory of Mind (ToM) refers to an agent's ability to model the internal states of others. Contributing to the debate whether large language models (LLMs) exhibit genuine ToM capabilities, our study investigates their ToM robustness using perturbations on false-belief tasks and examines the potential of Chain-of-Thought prompting (CoT) to enhance performance and explain the LLM's decision. We introduce a handcrafted, richly annotated ToM dataset, including classic and perturbed false belief tasks, the corresponding spaces of valid reasoning chains for correct task completion, subsequent reasoning faithfulness, task solutions, and propose metrics to evaluate reasoning chain correctness and to what extent final answers are faithful to reasoning traces of the generated CoT. We show a steep drop in ToM capabilities under task perturbation for all evaluated LLMs, questioning the notion of any robust form of ToM being present. While CoT prompting improves the ToM performance overall in a faithful manner, it surprisingly degrades accuracy for some perturbation classes, indicating that selective application is necessary.
翻译:心智理论(ToM)指智能体对他人内部状态进行建模的能力。针对大型语言模型(LLMs)是否具备真正心智理论能力的争论,本研究通过扰动错误信念任务来探究其心智理论的鲁棒性,并检验思维链提示(CoT)在提升性能及解释模型决策方面的潜力。我们构建了一个手工标注的精细化心智理论数据集,包含经典及扰动型错误信念任务、对应任务正确完成所需的有效推理链空间、后续推理忠实度、任务解决方案,并提出评估推理链正确性及最终答案与生成思维链推理轨迹忠实程度的度量指标。实验表明,所有被评估的大型语言模型在任务扰动下心智理论能力均急剧下降,这质疑了现有模型具备任何鲁棒形式心智理论的观点。虽然思维链提示总体上能以忠实方式提升心智理论性能,但令人惊讶的是,在某些扰动类别中反而降低了准确率,这表明需要选择性应用该技术。