Large Language Models (LLMs) have shown impressive abilities in various tasks. However, fundamentally improving them depends on high-quality datasets or computationally expensive fine-tuning. On the contrary, humans can easily improve themselves by self-thinking and memory, without external resources. In this paper, we propose a framework, MoT, to let the LLM self-improve through Memory-of-Thought, without annotated datasets and parameter updates. Specifically, MoT is divided into two stages: 1. before the test stage, the LLM pre-thinks on the unlabeled dataset and saves the high-confidence thoughts as external memory; 2. During the test stage, given a test question, the LLM recalls relevant memory to help itself reason and answer it. Experimental results show that MoT can help ChatGPT significantly improve its abilities in arithmetic reasoning, commonsense reasoning, factual reasoning, and natural language inference. Further analyses show that each component contributes critically to the improvements and MoT can lead to consistent improvements across various CoT methods and LLMs.
翻译:大规模语言模型(LLMs)在各种任务中表现出令人印象深刻的能力。然而,从根本上改进它们依赖于高质量数据集或计算代价高昂的微调。相比之下,人类可以通过自我思考和记忆轻松实现自我改进,无需借助外部资源。本文提出一种名为MoT的框架,通过记忆思维(Memory-of-Thought)让LLM实现自我改进,无需标注数据集和参数更新。具体而言,MoT分为两个阶段:1)在测试阶段之前,LLM在无标签数据集上预思考,并将高置信度的思考结果保存为外部记忆;2)在测试阶段,给定测试问题时,LLM回忆相关记忆以辅助自身推理和作答。实验结果表明,MoT能帮助ChatGPT在算术推理、常识推理、事实推理和自然语言推断等多个任务中显著提升能力。进一步分析表明,每个组件对改进均有关键贡献,且MoT可在多种CoT方法和LLM上实现一致的性能提升。