In Large Language Models (LLMs), there have been consistent advancements in task-specific performance, largely influenced by effective prompt design. While recent research on prompting has enhanced the reasoning capabilities of LLMs, a gap remains in further improving their understanding abilities. In this study, we introduce metacognitive prompting (MP), a strategy inspired by human introspective reasoning processes. Using MP, LLMs undergo a systematic series of structured, self-aware evaluations, drawing on both their vast inherent knowledge and new insights. Our experiments involve five prevalent LLMs: Llama2, Vicuna, PaLM, GPT-3.5, and GPT-4, all of which span various general natural language understanding (NLU) tasks from the GLUE and SuperGLUE benchmarks. Results indicate that, although GPT-4 consistently excels in most tasks, PaLM, when equipped with MP, approaches its performance level. Furthermore, across models and datasets, MP consistently outperforms existing prompting methods, including standard and chain-of-thought prompting. This study underscores the potential to amplify the understanding abilities of LLMs and highlights the benefits of mirroring human introspective reasoning in NLU tasks.
翻译:在大型语言模型(LLM)中,任务特定性能持续取得进展,这在很大程度上受到有效提示设计的影响。尽管近期关于提示的研究增强了LLM的推理能力,但在进一步提升其理解能力方面仍存在差距。本研究提出元认知提示(MP),这是一种受人类内省推理过程启发的策略。通过MP,LLM经历一系列系统化的结构化自我评估,既利用其自身广博的内在知识,也借助新见解。我们的实验涉及五种主流LLM:Llama2、Vicuna、PaLM、GPT-3.5和GPT-4,它们覆盖了GLUE和SuperGLUE基准中的多种通用自然语言理解(NLU)任务。结果表明,尽管GPT-4在大多数任务中始终表现优异,但配备MP的PaLM性能接近其水平。此外,跨模型和数据集,MP始终优于现有提示方法,包括标准提示和思维链提示。本研究突显了增强LLM理解能力的潜力,并强调了在NLU任务中模仿人类内省推理的益处。