Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. Techniques leveraging automated feedback -- either produced by the LLM itself or some external system -- are of particular interest as they are a promising way to make LLM-based solutions more practical and deployable with minimal human feedback. This paper presents a comprehensive review of this emerging class of techniques. We analyze and taxonomize a wide array of recent work utilizing these strategies, including training-time, generation-time, and post-hoc correction. We also summarize the major applications of this strategy and conclude by discussing future directions and challenges.
翻译:大型语言模型(LLMs)在众多自然语言处理任务中展现出卓越性能,但幻觉、不忠实推理及有害内容等不良不一致行为仍制约其效能。自我纠正是一种极具前景的改进方法,通过提示或引导LLM自行修正输出中的缺陷。基于自动化反馈(由LLM自身或外部系统生成)的技术尤其引人关注,因其有望以最少人工反馈实现LLM解决方案的实用化与部署。本文对此新兴技术体系进行全面综述,系统分析并分类了涵盖训练时、生成时及事后纠正的系列近期研究成果,同时总结该策略的主要应用场景,并探讨未来发展方向与挑战。