Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of available knowledge fragmented behind incompatible interfaces. A natural attempt at unification would collapse these sources into a shared space, but this erases the structural affordances (such as schemas, ontologies, compositional operators) that give each source its expressive power. Effective retrieval over diverse knowledge, therefore, requires not homogenization but an overarching layer that meets each source on its own terms. To achieve this, we present OmniRetrieval, a framework that takes any natural-language query, identifies appropriate knowledge sources, and dispatches source-native queries to their native execution engines. Across an extensive benchmark spanning 13 datasets and 309 distinct knowledge bases over text, relational, and graph-structured sources, OmniRetrieval exceeds single-source baselines, demonstrating that it can serve as a general-purpose interface to the heterogeneous sources while preserving the structural distinctions that make each source valuable.
翻译:现实世界的信息需求需要访问结构多样的知识源,从非结构化的文本、关系表到知识图谱和属性图。然而,现有检索器每次仅能在固定查询语言下处理单一数据源,导致可用知识的广阔格局因不兼容的接口而碎片化。一种自然的统一尝试会将不同数据源压缩至共享空间,但这会抹除支撑各数据源表达能力的结构性优势(如模式、本体、组合算子)。因此,对多样化知识的有效检索并非需要同质化,而是要构建一个能适配各数据源独特特性的全局性层面。为此,我们提出全知检索框架,该框架可接收任意自然语言查询,识别合适的知识源,并向原生执行引擎分发源特定查询。在涵盖13个数据集、309个独立知识库(覆盖文本、关系型与图结构数据源)的广泛基准测试中,全知检索超越了单数据源基线方法,证明其可作为异构数据源的通用接口,同时保留使每个数据源具有价值的结构性差异。