The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM's response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.
翻译:大语言模型(LLM)的反思能力已引起广泛关注。事后提示策略(如reflexion与self-refine)通过自我评估或外部反馈优化LLM的响应。然而,近期研究表明,若无外部反馈,LLM的内在反思并不稳定。我们的研究发现,关键瓶颈在于自我评估反馈的质量。LLM在进行自我评估时常表现出过度自信或高度随机性,给出固执或不一致的反馈,导致反思效果不佳。为解决这一问题,我们提出Self-Contrast策略:该方法根据请求自适应探索多样化的求解视角,对比差异,并将这些差异归纳为一份检查清单,用以重新审视并消除矛盾。我们的方法赋予LLM多元视角以缓解固执的偏见。此外,这些差异揭示了LLM常忽略的潜在错误或固有不确定性,基于此进行反思能催化更准确、更稳定的反思。在多个推理与翻译任务上使用不同LLM进行的实验,验证了本策略的有效性与普适性。