Data critical to real-world decision-making is increasingly found within organizations. Such data is heterogeneous, constantly evolving, and only imperfectly captured. However, current data management systems remain largely passive, retrieving what is explicitly stored while offering limited support for uncovering implicit structure or reasoning under noise, incompleteness, and continuous updates. We argue that next-generation data management requires neural capabilities, which can uncover complex latent relationships, distinguish reliable signals from noise, and remain consistent as the underlying data state evolves. To support this direction, we introduce NGDBench, a benchmark across five domains that unifies structured and unstructured sources. NGDBench adopts a graph view because graphs provide a flexible abstraction for modeling complex systems, capturing latent relationships, and subsuming structured formats such as relational tables. Each instance pairs a clean latent graph with a realistically perturbed observed graph. NGDBench supports full Cypher queries and dynamic data management operations. Evaluations of state-of-the-art Text-to-Cypher by LLMs and GraphRAG pipelines reveal that current neural query methods remain sensitive to noise and struggle with dynamic state tracking, highlighting the need for resilient, inference-capable data management. Our code is available at https://github.com/HKUST-KnowComp/NGDBench.
翻译:摘要:对现实世界决策至关重要的数据正日益出现在组织内部。这类数据具有异构性、持续演化性,且仅能实现不完全捕获。然而,当前的数据管理系统仍大多处于被动状态——仅能检索显式存储的数据,在揭示隐式结构或应对噪声、不完备性及持续更新下的推理方面支持有限。我们认为,下一代数据管理需要具备神经能力:这种能力能够挖掘复杂潜在关系,从噪声中区分可靠信号,并在底层数据状态演化过程中保持一致性。为支撑这一方向,我们提出NGDBench——一个跨五个领域、统一结构化与非结构化数据源的基准测试集。NGDBench采用图视图,因为图能为复杂系统建模提供灵活抽象,捕获潜在关系,并统摄关系表等结构化格式。每个实例配对了一个纯净的潜在图与一个经过现实扰动的观测图。NGDBench支持完整的Cypher查询与动态数据管理操作。对基于大语言模型的最先进文本到Cypher方法与图RAG管线的评估揭示:当前神经查询方法仍对噪声敏感,且在动态状态追踪上表现不佳,这凸显了对具备弹性与推理能力的数据管理的需求。我们的代码可在https://github.com/HKUST-KnowComp/NGDBench获取。