When faced with data problems, many data workers cannot articulate their information need precisely enough for software to help. Although LLMs interpret natural-language requests, they behave brittly when intent is under-specified, e.g., hallucinating fields, assuming join paths, or producing ungrounded answers. We present Pneuma-Seeker, a system built around a central idea: relational reification. Pneuma-Seeker represents a user's evolving information need as a relational schema: a concrete, analysis-ready data model shared between user and system. Rather than answering prompts directly, Pneuma-Seeker iteratively refines this schema, then discovers and prepares relevant sources to construct a relation and executable program that compute the answer. Pneuma-Seeker employs an LLM-powered agentic architecture with conductor-style planning and macro- and micro-level context management to operate effectively over heterogeneous relational corpora. We evaluate Pneuma-Seeker across multiple domains against state-of-the-art academic and industrial baselines, demonstrating higher answer accuracy. Deployment in a real organization highlights trust and inspectability as essential requirements for LLM-mediated data systems.
翻译:面对数据问题时,许多数据工作者往往无法精确描述其信息需求以供软件协助。尽管大语言模型能够解释自然语言请求,但在意图未充分明确时(例如,幻觉化字段、假设连接路径或生成无依据的答案),其表现往往脆弱。我们提出了Pneuma-Seeker系统,其核心思想是关系具象化。Pneuma-Seeker将用户动态演变的信息需求表示为一个关系模式:一种在用户与系统之间共享的、具体且可直接用于分析的数据模型。Pneuma-Seeker不直接回答提示,而是迭代优化该模式,随后发现并准备相关数据源,以构建能够计算答案的关系及可执行程序。该系统采用基于大语言模型的智能体架构,具备指挥式规划以及宏观与微观层面的上下文管理能力,从而能够在异构关系语料库上高效运作。我们在多个领域中将Pneuma-Seeker与最先进的学术及工业基线方法进行比较评估,结果表明其具有更高的答案准确性。在真实组织中的部署实践突显了可信度与可审查性作为大语言模型介导的数据系统的关键需求。