We introduce Reprompting, an iterative sampling algorithm that searches for the Chain-of-Thought (CoT) recipes for a given task without human intervention. Through Gibbs sampling, we infer CoT recipes that work consistently well for a set of training samples. Our method iteratively samples new recipes using previously sampled solutions as parent prompts to solve other training problems. On five Big-Bench Hard tasks that require multi-step reasoning, Reprompting achieves consistently better performance than the zero-shot, few-shot, and human-written CoT baselines. Reprompting can also facilitate transfer of knowledge from a stronger model to a weaker model leading to substantially improved performance of the weaker model. Overall, Reprompting brings up to +17 point improvements over the previous state-of-the-art method that uses human-written CoT prompts.
翻译:我们提出Reprompting,一种无需人工干预即可为给定任务搜索思维链(Chain-of-Thought, CoT)配方的迭代采样算法。通过吉布斯采样,我们推断出一组训练样本上表现稳定良好的CoT配方。该方法迭代地使用先前采样的解作为父提示来采样新配方,以解决其他训练问题。在五项需要多步推理的Big-Bench Hard任务上,Reprompting始终优于零样本、少样本及人工编写的CoT基线方法。Reprompting还可促进知识从更强模型向较弱模型的迁移,从而显著提升较弱模型的性能。总体而言,Reprompting相较于先前使用人工编写CoT提示的最先进方法带来了高达17个百分点的改进。