Language model-based code completion models have quickly grown in use, helping thousands of developers write code in many different programming languages. However, research on code completion models typically focuses on imperative languages such as Python and JavaScript, which results in a lack of representation for functional programming languages. Consequently, these models often perform poorly on functional languages such as Haskell. To investigate whether this can be alleviated, we evaluate the performance of two language models for code, CodeGPT and UniXcoder, on the functional programming language Haskell. We fine-tune and evaluate the models on Haskell functions sourced from a publicly accessible Haskell dataset on HuggingFace. Additionally, we manually evaluate the models using our novel translated HumanEval dataset. Our automatic evaluation shows that knowledge of imperative programming languages in the pre-training of LLMs may not transfer well to functional languages, but that code completion on functional languages is feasible. Consequently, this shows the need for more high-quality Haskell datasets. A manual evaluation on HumanEval-Haskell indicates CodeGPT frequently generates empty predictions and extra comments, while UniXcoder more often produces incomplete or incorrect predictions. Finally, we release HumanEval-Haskell, along with the fine-tuned models and all code required to reproduce our experiments on GitHub (https://github.com/AISE-TUDelft/HaskellCCEval).
翻译:基于语言模型的代码补全模型使用量迅速增长,帮助成千上万的开发者用多种编程语言编写代码。然而,关于代码补全模型的研究通常侧重于Python和JavaScript等命令式语言,导致函数式编程语言缺乏代表性。因此,这些模型在Haskell等函数式语言上通常表现不佳。为探究这一状况是否可以得到改善,我们评估了两种代码语言模型CodeGPT和UniXcoder在函数式编程语言Haskell上的性能。我们在从HuggingFace可公开访问的Haskell数据集中获取的Haskell函数上对模型进行微调和评估。此外,我们利用新翻译的HumanEval数据集对模型进行人工评估。自动评估表明,大语言模型预训练中的命令式编程语言知识可能无法很好地迁移到函数式语言,但在函数式语言上进行代码补全是可行的。因此,这凸显了对更高质量Haskell数据集的需求。在HumanEval-Haskell上的人工评估显示,CodeGPT经常生成空预测和额外注释,而UniXcoder更常产生不完整或不正确的预测。最后,我们在GitHub上发布了HumanEval-Haskell、微调后的模型及复现实验所需的所有代码(https://github.com/AISE-TUDelft/HaskellCCEval)。