Large language models (LLMs) have achieved strong performance on code completion tasks in general-purpose programming languages. However, existing repository-level code completion benchmarks focus almost exclusively on software code and largely overlook hardware description languages. In this work, we present \textbf{MHRC-Bench}, consisting of \textbf{MHRC-Bench-Train} and \textbf{MHRC-Bench-Eval}, the first benchmark designed for multilingual hardware code completion at the repository level. Our benchmark targets completion tasks and covers three major hardware design coding styles. Each completion target is annotated with code-structure-level and hardware-oriented semantic labels derived from concrete syntax tree analysis. We conduct a comprehensive evaluation of models on MHRC-Bench-Eval. Comprehensive evaluation results and analysis demonstrate the effectiveness of MHRC-Bench.
翻译:大型语言模型(LLM)在通用编程语言的代码补全任务上已展现出强大性能。然而,现有的仓库级代码补全基准几乎完全专注于软件代码,很大程度上忽视了硬件描述语言。在本工作中,我们提出了 **MHRC-Bench**,包含 **MHRC-Bench-Train** 和 **MHRC-Bench-Eval**,这是首个为仓库级多语言硬件代码补全设计的基准。我们的基准针对补全任务,涵盖了三种主要的硬件设计编码风格。每个补全目标均通过从具体语法树分析中得出的代码结构级和面向硬件的语义标签进行标注。我们在 MHRC-Bench-Eval 上对模型进行了全面评估。综合评估结果与分析证明了 MHRC-Bench 的有效性。