Recent advancements have highlighted that large language models (LLMs), when given a small set of task-specific examples, demonstrate remarkable proficiency, a capability that extends to complex reasoning tasks. In particular, the combination of few-shot learning with the chain-of-thought (CoT) approach has been pivotal in steering models towards more logically consistent conclusions. This paper explores the optimization of example selection for designing effective CoT pre-prompts and shows that the choice of the optimization algorithm, typically in favor of comparison-based methods such as evolutionary computation, significantly enhances efficacy and feasibility. Specifically, thanks to a limited exploitative and overfitted optimization, Evolutionary Pre-Prompt Optimization (EPPO) brings an improvement over the naive few-shot approach exceeding 10 absolute points in exact match scores on benchmark datasets such as GSM8k and MathQA. These gains are consistent across various contexts and are further amplified when integrated with self-consistency (SC)
翻译:近期研究表明,当大型语言模型(LLMs)获得少量任务特定示例时,它们展现出卓越的推理能力,这一能力可延伸至复杂推理任务。特别值得注意的是,小样本学习与思维链(CoT)方法的结合,对于引导模型得出更具逻辑一致性的结论起到了关键作用。本文探讨了为设计有效CoT预提示而进行的示例选择优化,并证明优化算法的选择——通常倾向于基于比较的方法(如进化计算)——能显著提升效能与可行性。具体而言,通过有限度的开发性且避免过度拟合的优化,进化式预提示优化(EPPO)相较于朴素的小样本方法,在GSM8k和MathQA等基准数据集上的精确匹配分数提升了超过10个绝对百分点。这些增益在不同情境下均保持一致,且在与自洽性(SC)方法结合时得到进一步放大。