In Large Language Models (LLMs), there have been consistent advancements in task-specific performance, largely influenced by effective prompt design. Recent advancements in prompting have enhanced reasoning in logic-intensive tasks for LLMs, yet the nuanced understanding abilities of these models, crucial for processing and interpreting complex information, remain underexplored. In this study, we introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes. Using MP, LLMs undergo a systematic series of structured, self-aware evaluations, drawing on both their vast inherent knowledge and new insights. We conduct extensive experiments on four prevalent LLMs: Llama2, PaLM2, GPT-3.5, and GPT-4, across ten natural language understanding (NLU) datasets from GLUE, SuperGLUE, BLUE, and LexGLUE benchmarks. Additionally, we compare our method with chain-of-thought prompting and its advanced versions. The results show that GPT-4 consistently excels across all tasks, while other models have shown significant progress in some tasks when used in conjunction with MP. Furthermore, MP consistently outperforms existing prompting methods in both general and domain-specific NLU tasks. This study underscores the potential to amplify the understanding abilities of LLMs and highlights the benefits of mirroring human introspective reasoning in NLU tasks.
翻译:在大语言模型(LLMs)中,任务特定性能持续取得进展,很大程度上受有效的提示设计影响。近期提示设计的进步增强了LLMs在逻辑密集型任务中的推理能力,然而这些模型处理与解释复杂信息所依赖的细腻理解能力仍未得到充分探索。本研究提出了一种受人类内省推理过程启发的策略——元认知提示(MP)。通过MP,LLMs经历一系列系统化的结构化自我评估,既利用其海量的固有知识,也结合新的见解。我们在Llama2、PaLM2、GPT-3.5和GPT-4四种主流LLMs上,针对来自GLUE、SuperGLUE、BLUE和LexGLUE基准测试的十个自然语言理解(NLU)数据集进行了广泛实验。此外,我们将方法与思维链提示及其高级版本进行了比较。结果显示,GPT-4在所有任务中始终表现优异,而其他模型在与MP结合使用时,部分任务取得了显著进展。同时,在通用与领域特定的NLU任务中,MP始终优于现有提示方法。本研究强调了增强LLMs理解能力的潜力,并揭示了在NLU任务中模仿人类内省推理的益处。