While chain-of-thought (CoT) prompting has revolutionized how LLMs perform reasoning tasks, its current methods and variations (e.g, Self-consistency, ReACT, Reflexion, Tree-of-Thoughts (ToT), Cumulative Reasoning (CR)) suffer from limitations like slowness, limited context grounding, hallucination and inconsistent outputs. To overcome these challenges, we introduce Evidence to Generate (E2G), a novel single-agent, two-step prompting framework. Instead of unverified reasoning claims, this innovative approach leverages the power of "evidence for decision making" by first focusing exclusively on the thought sequences (the series of intermediate steps) explicitly mentioned in the context which then serve as extracted evidence, guiding the LLM's output generation process with greater precision and efficiency. This simple yet powerful approach unlocks the true potential of chain-of-thought like prompting, paving the way for faster, more reliable, and more contextually aware reasoning in LLMs. \tool achieves remarkable results robustly across a wide range of knowledge-intensive reasoning and generation tasks, surpassing baseline approaches with state-of-the-art LLMs. For example, (i) on LogiQA benchmark using GPT-4 as backbone model, \tool achieves a new state-of-the Accuracy of 53.8% exceeding CoT by 18%, ToT by 11%, CR by 9% (ii) a variant of E2G with PaLM2 outperforms the variable-shot performance of Gemini Ultra by 0.9 F1 points, reaching an F1 score of 83.3 on a subset of DROP.
翻译:尽管链式思维提示彻底改变了大型语言模型的推理任务执行方式,但当前方法及其变体(如自一致性、ReACT、Reflexion、思维树、累积推理)存在响应迟缓、上下文锚定有限、生成幻觉及输出不一致等局限性。为克服这些挑战,我们提出证据生成(E2G)——一种新颖的单智能体两步提示框架。该创新方法摒弃未经核验的推理主张,通过优先聚焦上下文中显式提及的思维序列(中间步骤的连续集合)作为提取证据,以更高精度和效率引导大语言模型的输出生成过程,充分释放"基于证据决策"的潜力。这种简洁而强大的方法解锁了类链式思维提示的真正潜能,为大语言模型更快速、更可靠且更具上下文感知能力的推理铺平道路。该工具在广泛的知识密集型推理与生成任务中展现出强大的鲁棒性,性能超越基于最先进大语言模型的基线方法。例如:(i)在LogiQA基准测试中,以GPT-4为骨干模型时,该工具以53.8%的准确率创下新纪录,分别超越CoT(18%)、ToT(11%)、CR(9%);(ii)基于PaLM2的E2G变体在DROP子集上以83.3的F1得分超越Gemini Ultra的变例性能0.9个F1百分点。