Domain knowledge refers to the in-depth understanding, expertise, and familiarity with a specific subject, industry, field, or area of special interest. The existing benchmarks are all lack of an overall design for domain knowledge evaluation. Holding the belief that the real ability of domain language understanding can only be fairly evaluated by an comprehensive and in-depth benchmark, we introduces the Domma, a Domain Mastery Benchmark. DomMa targets at testing Large Language Models (LLMs) on their domain knowledge understanding, it features extensive domain coverage, large data volume, and a continually updated data set based on Chinese 112 first-level subject classifications. DomMa consist of 100,000 questions in both Chinese and English sourced from graduate entrance examinations and undergraduate exams in Chinese college. We have also propose designs to make benchmark and evaluation process more suitable to LLMs.
翻译:领域知识指对特定学科、行业、领域或特殊兴趣领域的深入理解、专业知识和熟悉程度。现有基准均缺乏领域知识评估的全局性设计。我们坚信,只有通过全面且深入的基准才能公平评估真实的领域语言理解能力,由此提出领域掌握基准DomMa。DomMa旨在测试大型语言模型(LLM)的领域知识理解能力,其特点包括广泛的领域覆盖、大规模数据量,以及基于中国112个一级学科分类的持续更新数据集。该基准包含10万道中英文双语题目,源自中国研究生入学考试和本科阶段考试。我们还提出了使基准与评估流程更适配LLM的设计方案。