In Large Language Models (LLMs), there have been consistent advancements in task-specific performance, largely influenced by effective prompt design. While recent research on prompting has enhanced the reasoning capabilities of LLMs, a gap remains in further improving their understanding abilities. In this study, we introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes. Using MP, LLMs undergo a systematic series of structured, self-aware evaluations, drawing on both their vast inherent knowledge and new insights. Our experiments involve five prevalent LLMs: Llama2, Vicuna, PaLM, GPT-3.5, and GPT-4, all of which span various general natural language understanding (NLU) tasks from the GLUE and SuperGLUE benchmarks. Results indicate that, although GPT-4 consistently excels in most tasks, PaLM, when equipped with MP, approaches its performance level. Furthermore, across models and datasets, MP consistently outperforms existing prompting methods, including standard and chain-of-thought prompting. This study underscores the potential to amplify the understanding abilities of LLMs and highlights the benefits of mirroring human introspective reasoning in NLU tasks.
翻译:在大型语言模型(LLMs)中,任务特定性能持续取得进展,很大程度上受有效提示设计的影响。尽管近期关于提示的研究增强了LLMs的推理能力,但在进一步提升其理解能力方面仍存在差距。在本研究中,我们引入了元认知提示(MP),这是一种受人类内省推理过程启发的策略。通过MP,LLMs经历一系列系统化、结构化的自我意识评估,利用其庞大的固有知识与新见解。我们的实验涉及五种主流LLMs:Llama2、Vicuna、PaLM、GPT-3.5和GPT-4,它们覆盖了来自GLUE和SuperGLUE基准的各类通用自然语言理解(NLU)任务。结果表明,尽管GPT-4在大多数任务中持续表现优异,但配备MP的PaLM性能已接近其水平。此外,跨模型和数据集,MP始终优于现有提示方法,包括标准和思维链提示。本研究突显了增强LLMs理解能力的潜力,并强调了在NLU任务中模仿人类内省推理的优势。