Language models have become effective at a wide range of tasks, from math problem solving to open-domain question answering. However, they still make mistakes, and these mistakes are often repeated across related queries. Natural language explanations can help correct these errors, but collecting them at scale may be infeasible, particularly in domains where expert annotators are required. To address this issue, we introduce FLEx ($\textbf{F}$ew-shot $\textbf{L}$anguage $\textbf{Ex}$planations), a method for improving model behavior using a small number of explanatory examples. FLEx selects representative model errors using embedding-based clustering, verifies that the associated explanations correct those errors, and summarizes them into a prompt prefix that is prepended at inference-time. This summary guides the model to avoid similar errors on new inputs, without modifying model weights. We evaluate FLEx on CounterBench, GSM8K, and ReasonIF. We find that FLEx consistently outperforms chain-of-thought (CoT) prompting across all three datasets and reduces up to 83\% of CoT's remaining errors.
翻译:语言模型已在从数学问题求解到开放域问答的广泛任务中展现出有效性。然而,它们仍会犯错,且这些错误常在相关查询中重复出现。自然语言解释有助于纠正这些错误,但大规模收集解释可能并不可行,尤其是在需要专家标注的领域。为解决此问题,我们提出FLEx(少样本语言解释),一种利用少量解释性示例改进模型行为的方法。FLEx通过基于嵌入的聚类选择具有代表性的模型错误,验证相关解释能否纠正这些错误,并将其总结为推理时预置的提示前缀。该总结引导模型在新输入上避免类似错误,而无需修改模型权重。我们在CounterBench、GSM8K和ReasonIF数据集上评估FLEx。结果表明,FLEx在全部三个数据集上均持续优于思维链提示,并最多减少了思维链提示83%的残留错误。