Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite their great success, their effectiveness within particular domains (e.g., web development) necessitates further evaluation. In this study, we conduct an empirical study of domain-specific code generation with LLMs. We demonstrate that LLMs exhibit sub-optimal performance in generating domain-specific code, due to their limited proficiency in utilizing domain-specific libraries. We further observe that incorporating API knowledge as prompts can empower LLMs to generate more professional code. Based on these findings, we further investigate how to efficiently incorporate API knowledge into the code generation process. We experiment with three strategies for incorporating domain knowledge, namely, external knowledge inquirer, chain-of-thought prompting, and chain-of-thought fine-tuning. We refer to these strategies as a new code generation approach called DomCoder. Experimental results show that all strategies of DomCoder lead to improvement in the effectiveness of domain-specific code generation under certain settings. The results also show that there is still ample room for further improvement, based on which we suggest possible future works.
翻译:大型语言模型(如ChatGPT)在代码生成方面展现出卓越能力。尽管取得了巨大成功,但这些模型在特定领域(例如Web开发)中的有效性仍需进一步评估。本研究对使用大语言模型进行领域特定代码生成进行了实证分析。我们证明,由于大语言模型在利用领域特定库方面的能力有限,其在生成领域特定代码时表现出次优性能。进一步观察发现,将API知识作为提示注入可增强模型生成更专业代码的能力。基于此发现,我们进一步探究如何高效地将API知识融入代码生成流程。我们实验了三种领域知识融合策略:外部知识查询器、思维链提示和思维链微调,并将这些策略统称为名为DomCoder的新型代码生成方法。实验结果表明,在特定设置下,DomCoder的所有策略均能提升领域特定代码生成的有效性。研究同时显示该方法仍有较大改进空间,据此我们提出了未来可能的研究方向。