CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering

In this work, we introduce CodeRepoQA, a large-scale benchmark specifically designed for evaluating repository-level question-answering capabilities in the field of software engineering. CodeRepoQA encompasses five programming languages and covers a wide range of scenarios, enabling comprehensive evaluation of language models. To construct this dataset, we crawl data from 30 well-known repositories in GitHub, the largest platform for hosting and collaborating on code, and carefully filter raw data. In total, CodeRepoQA is a multi-turn question-answering benchmark with 585,687 entries, covering a diverse array of software engineering scenarios, with an average of 6.62 dialogue turns per entry. We evaluate ten popular large language models on our dataset and provide in-depth analysis. We find that LLMs still have limitations in question-answering capabilities in the field of software engineering, and medium-length contexts are more conducive to LLMs' performance. The entire benchmark is publicly available at https://github.com/kinesiatricssxilm14/CodeRepoQA.

翻译：本文提出了CodeRepoQA，一个专为评估软件工程领域仓库级问答能力而设计的大规模基准测试。CodeRepoQA涵盖五种编程语言及广泛的应用场景，能够对语言模型进行全面评估。为构建此数据集，我们从最大的代码托管与协作平台GitHub中爬取了30个知名仓库的数据，并对原始数据进行了细致筛选。CodeRepoQA总计包含585,687条多轮问答记录，覆盖多样化的软件工程场景，平均每轮对话包含6.62个话轮。我们在该数据集上评估了十种主流大语言模型，并提供了深入分析。研究发现，大语言模型在软件工程领域的问答能力仍存在局限，中等长度的上下文更有利于模型性能发挥。完整基准测试已公开于https://github.com/kinesiatricssxilm14/CodeRepoQA。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日