Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry. However, some vertical domains are more interested in the in-domain capabilities of LLMs. For the Networks domain, we present NetEval, an evaluation set for measuring the comprehensive capabilities of LLMs in Network Operations (NetOps). NetEval is designed for evaluating the commonsense knowledge and inference ability in NetOps in a multi-lingual context. NetEval consists of 5,732 questions about NetOps, covering five different sub-domains of NetOps. With NetEval, we systematically evaluate the NetOps capability of 26 publicly available LLMs. The results show that only GPT-4 can achieve a performance competitive to humans. However, some open models like LLaMA 2 demonstrate significant potential.
翻译:如今,预训练大语言模型(LLMs)的多功能能力已引起工业界的广泛关注。然而,部分垂直领域更关注LLMs的领域内能力。针对网络领域,我们提出了NetEval——一套用于衡量LLMs在网络运维(NetOps)方面综合能力的评估集。NetEval专为评估多语言环境下NetOps的常识知识与推理能力而设计,包含5,732道关于NetOps的问题,覆盖其五个不同子领域。借助NetEval,我们系统评估了26个公开可用的LLMs的网络运维能力。结果表明,仅GPT-4能达到与人类竞争的性能,但LLaMA 2等部分开源模型展现出显著潜力。