While instruction fine-tuned LLMs are effective text generators, sensitivity to prompt construction makes performance unstable and sub-optimal in practice. Relying on a single "best" prompt cannot capture all differing approaches to a generation problem. Using this observation, we propose multi-prompt decoding, where many candidate generations are decoded from a prompt bank at inference-time. To ensemble candidates, we use Minimum Bayes Risk (MBR) decoding, which selects a final output using a trained value metric. We show multi-prompt improves MBR across a comprehensive set of conditional generation tasks, and show this is a result of estimating a more diverse and higher quality candidate space than that of a single prompt. Further experiments confirm multi-prompt improves generation across tasks, models and metrics.
翻译:尽管指令微调的大型语言模型是有效的文本生成器,但对提示构建的敏感性使得实际性能不稳定且次优。依赖单一"最佳"提示无法涵盖生成问题的所有不同解决路径。基于此观察,我们提出多提示解码方法,在推理时从提示库中解码多个候选生成结果。为了集成候选结果,我们采用最小贝叶斯风险解码,通过训练的价值度量选择最终输出。我们证明多提示方法能在一系列全面的条件生成任务中提升MBR性能,并表明这源于其比单提示方法能估计出更多样化且更高质量的候选空间。进一步的实验证实,多提示方法在不同任务、模型和评估指标上均能改善生成质量。