Large Language models (LLMs) can be induced to solve non-trivial problems with "few-shot" prompts including illustrative problem-solution examples. Now if the few-shots also include "chain of thought" (CoT) explanations, which are of the form problem-explanation-solution, LLMs will generate a "explained" solution, and perform even better. Recently an exciting, substantially better technique, self-consistency [1] (S-C) has emerged, based on the intuition that there are many plausible explanations for the right solution; when the LLM is sampled repeatedly to generate a pool of explanation-solution pairs, for a given problem, the most frequently occurring solutions in the pool (ignoring the explanations) tend to be even more likely to be correct! Unfortunately, the use of this highly-performant S-C (or even CoT) approach in software engineering settings is hampered by the lack of explanations; most software datasets lack explanations. In this paper, we describe an application of the S-C approach to program repair, using the commit log on the fix as the explanation, only in the illustrative few-shots. We achieve state-of-the art results, beating previous approaches to prompting-based program repair, on the MODIT dataset; we also find evidence suggesting that the correct commit messages are helping the LLM learn to produce better patches.
翻译:大型语言模型(LLMs)可通过包含示例性问题-解决方案对在内的"少样本"提示,被引导解决非平凡问题。若少样本进一步包含"思维链"(CoT)解释(即问题-解释-解决方案形式),LLMs将生成"带解释的"解决方案并表现更佳。近期,一种更优的新技术——自一致性[1](S-C)——基于以下直觉涌现:正确解决方案存在多种合理解释;当对LLM重复采样以生成给定问题的解释-解决方案对池时,池中出现频率最高的解决方案(忽略解释)往往更可能正确!遗憾的是,这种高性能S-C(甚至CoT)方法在软件工程场景中的应用受限于缺乏解释;多数软件数据集不包含解释。本文描述了将S-C方法应用于程序修复的实践——仅在示例性少样本中使用修复提交日志作为解释。我们在MODIT数据集上取得了超越先前基于提示的程序修复方法的最新成果;同时,证据表明正确的提交消息有助于LLM学习生成更优补丁。