The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.
翻译:预训练大型语言模型(LLM)在推理时利用自然语言反馈的潜力是近期令人振奋的进展。我们基于这一观察,形式化了一种在训练时从自然语言反馈中学习的算法,称为语言反馈模仿学习(ILF)。ILF在训练时仅需少量人工撰写的反馈,且在测试时无需相同反馈,因而兼具用户友好性与样本高效性。我们进一步证明,ILF可视为最小化与真实分布之间的KL散度的一种形式,并在神经程序合成任务上进行了概念验证。通过ILF,我们将Codegen-Mono 6.1B模型在Mostly Basic Python Problems(MBPP)基准上的pass@1率相对提升了38%(绝对提升10%),性能优于在MBPP上直接微调以及基于人工修复程序进行微调的方法。总体而言,我们的结果表明,相较于仅使用示范数据进行训练,从人工撰写的自然语言反馈中学习能更有效且更高效地提升LLM在代码生成任务中的性能。