In Large Language Models (LLMs), there have been consistent advancements in task-specific performance, largely influenced by effective prompt design. While recent research on prompting has enhanced the reasoning capabilities of LLMs, a gap remains in further improving their understanding abilities. In this study, we introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes. Using MP, LLMs undergo a systematic series of structured, self-aware evaluations, drawing on both their vast inherent knowledge and new insights. Our experiments involve five prevalent LLMs: Llama2, Vicuna, PaLM, GPT-3.5, and GPT-4, all of which span various general natural language understanding (NLU) tasks from the GLUE and SuperGLUE benchmarks. Results indicate that, although GPT-4 consistently excels in most tasks, PaLM, when equipped with MP, approaches its performance level. Furthermore, across models and datasets, MP consistently outperforms existing prompting methods, including standard and chain-of-thought prompting. This study underscores the potential to amplify the understanding abilities of LLMs and highlights the benefits of mirroring human introspective reasoning in NLU tasks.
翻译:在大语言模型(LLMs)中,任务特定性能取得了持续进步,这在很大程度上受到有效提示设计的影响。尽管最近关于提示的研究增强了LLMs的推理能力,但在进一步提高其理解能力方面仍存在差距。在本研究中,我们引入了元认知提示(MP),这是一种受人类内省推理过程启发的策略。使用MP,LLMs经历一系列系统化、结构化的自我意识评估,利用其固有的广泛知识和新见解。我们的实验涉及五种主流LLMs:Llama2、Vicuna、PaLM、GPT-3.5和GPT-4,这些模型涵盖了来自GLUE和SuperGLUE基准的各项通用自然语言理解(NLU)任务。结果表明,尽管GPT-4在大多数任务中始终表现优异,但配备MP的PaLM接近了其性能水平。此外,跨模型和数据集,MP持续优于现有提示方法,包括标准提示和链式思考提示。这项研究强调了增强LLMs理解能力的潜力,并凸显了在NLU任务中模仿人类内省推理的益处。