Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (lLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most lLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage lLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various lLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by gpt-3.5-turbo (175B). Our study also showcases the potential of lLMs in software engineering applications.
翻译:大语言模型(LLMs)在代码生成领域展现出显著潜力。融入思维链(CoT)推理可进一步提升其性能。然而,当前CoT方法通常需要人工编写或依赖参数超过千亿的LLMs生成,这阻碍了其在资源受限场景中的适用性。本研究聚焦轻量级语言模型(lLMs),定义为参数少于百亿的模型。实验发现,大多数lLMs无法通过少样本方法生成高质量CoT,但可利用外部生成的高质量CoT来提升代码生成性能。基于此发现,我们设计了名为COTTON的新方法,该方法可利用lLMs自动生成面向代码生成的CoT。通过合成新数据集并在多个基准上进行广泛实验,结果表明COTTON生成的CoT在自动评估与人工评估指标上均优于基线方法。特别值得注意的是,COTTON生成的CoT能驱动各类lLMs获得比ChatGLM(130B)等大模型生成的CoT更高的性能增益,且与gpt-3.5-turbo(175B)生成的CoT性能相当。本研究还展示了lLMs在软件工程应用中的潜力。