Large Language models (LLMs) can be induced to solve non-trivial problems with "few-shot" prompts including illustrative problem-solution examples. Now if the few-shots also include "chain of thought" (CoT) explanations, which are of the form problem-explanation-solution, LLMs will generate a "explained" solution, and perform even better. Recently an exciting, substantially better technique, self-consistency [1] (S-C) has emerged, based on the intuition that there are many plausible explanations for the right solution; when the LLM is sampled repeatedly to generate a pool of explanation-solution pairs, for a given problem, the most frequently occurring solutions in the pool (ignoring the explanations) tend to be even more likely to be correct! Unfortunately, the use of this highly-performant S-C (or even CoT) approach in software engineering settings is hampered by the lack of explanations; most software datasets lack explanations. In this paper, we describe an application of the S-C approach to program repair, using the commit log on the fix as the explanation, only in the illustrative few-shots. We achieve state-of-the art results, beating previous approaches to prompting-based program repair, on the MODIT dataset; we also find evidence suggesting that the correct commit messages are helping the LLM learn to produce better patches.
翻译:大型语言模型(LLMs)可通过包含示例问题-解决方案的“少样本”提示引导解决非平凡问题。如今,若少样本中同时包含“思维链”(CoT)形式的说明(即问题-解释-解决方案结构),LLMs将生成“附带解释的”解决方案,并取得更优性能。近期,一种基于直觉的显著改进技术——自一致性(S-C)[1]脱颖而出:针对正确解存在多种合理解释,当对LLMs进行重复采样以生成解释-解决方案对池时,该池中出现频率最高的解(忽略解释)往往更趋近正确答案。然而,这种高性能S-C(甚至CoT)方法在软件工程场景中的应用受限于解释缺失问题——多数软件数据集缺乏解释。本文描述了S-C方法在程序修复中的应用,仅在使用提交日志作为解释的示例少样本中实现。我们在MODIT数据集上取得了最先进成果,超越了此前基于提示的程序修复方法;同时发现证据表明,正确的提交消息有助于LLMs学习生成更优补丁。