The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.
翻译:预训练大型语言模型(LLMs)在推理时利用自然语言反馈的潜力是近期令人兴奋的进展。我们在此观察基础上,形式化了一种在训练时从自然语言反馈中学习的算法,称为基于语言反馈的模仿学习(ILF)。ILF仅需训练期间少量人工撰写的反馈,且在测试时无需相同反馈,使其兼具用户友好性和样本高效性。我们进一步证明,ILF可视为最小化与真实分布的KL散度的一种形式,并在神经程序合成任务中验证了其概念可行性。通过ILF,我们在Mostly Basic Python Problems(MBPP)基准测试上将Codegen-Mono 6.1B模型的pass@1率相对提升38%(绝对提升10%),其表现优于在MBPP上微调以及使用人工修复程序微调的方法。总体而言,我们的结果表明,相较于仅依赖示例进行训练,从人工撰写的自然语言反馈中学习在提升LLM代码生成任务性能方面更有效且样本效率更高。