LangProp is a framework for iteratively optimizing code generated by large language models (LLMs) in a supervised/reinforcement learning setting. While LLMs can generate sensible solutions zero-shot, the solutions are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, as well as catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA, showing that LangProp can generate interpretable and transparent driving policies that can be verified and improved in a metric- and data-driven way. Our code will be open-sourced and is available at https://github.com/shuishida/LangProp.
翻译:LangProp是一种在监督/强化学习设置中,用于迭代优化大型语言模型(LLMs)所生成代码的框架。尽管LLMs可以零样本生成合理的解决方案,但这些方案往往并非最优。特别是在代码生成任务中,初始代码很可能在某些边界情况下失败。LangProp在数据集上自动评估代码在输入-输出对上的性能,同时捕获任何异常,并将结果反馈到LLM的训练循环中,从而使LLM能够迭代改进其生成的代码。通过采用基于指标和数据的训练范式来进行这一代码优化过程,可以轻松借鉴传统机器学习技术(如模仿学习、DAgger和强化学习)的研究成果。我们在CARLA中演示了首个自动驾驶自动代码优化的概念验证,表明LangProp能够生成可解释且透明的驾驶策略,并且这些策略可以以基于指标和数据的方式进行验证和改进。我们的代码将开源,并可通过https://github.com/shuishida/LangProp获取。