Retrieval-Augmented Generation (RAG) has become the standard approach for grounding large language models in information that was not available during training. While existing datasets and benchmarks focus on web or other public sources, there is still no widely adopted dataset that realistically reflects the nature of company-internal knowledge. Meanwhile, startups, enterprises, and researchers are increasingly developing AI Agents designed to operate over exactly this kind of proprietary data. To close this gap, we release a synthetic enterprise corpus, its generation framework, and a leaderboard. We present EnterpriseRAG-Bench, a dataset consisting of approximately 500,000 documents spanning nine enterprise source types (Slack, Gmail, Linear, Google Drive, HubSpot, Fireflies, GitHub, Jira, and Confluence) and 500 questions across ten categories that test distinct retrieval and reasoning capabilities. The corpus is generated with cross-document coherence (grounded in shared projects, people, and initiatives) and augmented with realistic noise such as misfiled documents, near-duplicates, and conflicting information. The question set ranges from simple single-document lookups to multi-document reasoning, constrained retrieval, conflict resolution, and recognizing when information is absent. The generation framework lets teams generate variants tailored to their own industry, scale, and source mix. The dataset, code, evaluation harness, and leaderboard are available at https://github.com/onyx-dot-app/EnterpriseRAG-Bench.
翻译:[translated abstract in Chinese]
检索增强生成已成为将大语言模型与训练期间未包含的信息进行锚定的标准方法。尽管现有数据集和基准测试主要聚焦于网络或其他公开来源,但目前尚缺乏能真实反映企业内部知识本质的广泛采用的数据集。与此同时,初创企业、大型企业及研究人员正日益开发设计用于处理此类专有数据的AI智能体。为弥合这一差距,我们发布了一个合成企业语料库、其生成框架以及一个排行榜。我们提出EnterpriseRAG-Bench,该数据集包含约50万份文档,涵盖九种企业源类型(Slack、Gmail、Linear、Google Drive、HubSpot、Fireflies、GitHub、Jira和Confluence),以及500个问题,这些问题横跨十个类别,用于测试不同的检索与推理能力。该语料库通过跨文档一致性(基于共享项目、人员和计划)生成,并辅以真实噪声(如错误归档文档、近似重复内容和矛盾信息)进行增强。问题集涵盖从简单的单文档查找、多文档推理、受约束检索、冲突解决,到识别信息缺失等场景。该生成框架使团队能够生成针对自身行业、规模及数据源组合定制的变体。数据集、代码、评估工具与排行榜均可在https://github.com/onyx-dot-app/EnterpriseRAG-Bench获取。