Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this gap, we present \textbf{\name} (\textbf{Co}de \textbf{I}nformation \textbf{R}etrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. \name comprises \textbf{ten} meticulously curated code datasets, spanning \textbf{eight} distinctive retrieval tasks across \textbf{seven} diverse domains. We first discuss the construction of \name and its diverse dataset composition. Further, we evaluate nine widely used retrieval models using \name, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems. To facilitate easy adoption and integration within existing research workflows, \name has been developed as a user-friendly Python framework, readily installable via pip. It shares same data schema as other popular benchmarks like MTEB and BEIR, enabling seamless cross-benchmark evaluations. Through \name, we aim to invigorate research in the code retrieval domain, providing a versatile benchmarking tool that encourages further development and exploration of code retrieval systems\footnote{\url{ https://github.com/CoIR-team/coir}}.
翻译:尽管信息检索(IR)在各种自然语言处理任务中取得了显著成功,但大多数IR系统主要处理自然语言查询和语料库,忽视了代码检索这一领域。代码检索至关重要却仍未得到充分探索,现有方法和基准未能充分代表不同领域和任务中代码的多样性。为弥补这一空白,我们提出了\textbf{\name}(\textbf{Co}de \textbf{I}nformation \textbf{R}etrieval Benchmark),这是一个专门用于评估代码检索能力的稳健且全面的基准。\name包含\textbf{十个}精心策划的代码数据集,涵盖\textbf{七个}不同领域中的\textbf{八个}独特检索任务。我们首先讨论了\name的构建及其多样化的数据集组成。进一步,我们使用\name评估了九个广泛使用的检索模型,揭示了即使使用最先进的系统在执行代码检索任务时仍面临显著困难。为便于现有研究工作流程的采用和集成,\name已开发为一个用户友好的Python框架,可通过pip轻松安装。它与其他流行基准(如MTEB和BEIR)共享相同的数据模式,支持无缝的跨基准评估。通过\name,我们旨在激发代码检索领域的研究,提供一个多功能的基准测试工具,以促进代码检索系统的进一步开发和探索\footnote{\url{ https://github.com/CoIR-team/coir}}。