In this study, we investigated the effects of self-reflection in large language models (LLMs) on problem-solving performance. We instructed nine popular LLMs to answer a series of multiple-choice questions to provide a performance baseline. For each incorrectly answered question, we instructed eight types of self-reflecting LLM agents to reflect on their mistakes and provide themselves with guidance to improve problem-solving. Then, using this guidance, each self-reflecting agent attempted to re-answer the same questions. Our results indicate that LLM agents are able to significantly improve their problem-solving performance through self-reflection ($p < 0.001$). In addition, we compared the various types of self-reflection to determine their individual contribution to performance. All code and data are available on GitHub at https://github.com/matthewrenze/self-reflection
翻译:本研究探讨了大语言模型(LLMs)中自我反思对问题解决性能的影响。我们首先指令九个主流大语言模型回答一系列多项选择题,以建立性能基线。针对每个错误回答的问题,我们指令八种不同类型的具备自我反思能力的大语言模型智能体反思其错误,并为自身提供改进问题解决的指导。随后,每个自我反思智能体利用此指导尝试重新回答相同的问题。我们的结果表明,大语言模型智能体能够通过自我反思显著提升其问题解决性能($p < 0.001$)。此外,我们比较了不同类型的自我反思方式,以确定各自对性能的贡献。所有代码与数据均可在 GitHub 上获取:https://github.com/matthewrenze/self-reflection