Pneuma项目：将信息需求具象化为关系模式以自动化发现、指导准备并使数据与意图对齐 (The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent)

Data discovery and preparation remain persistent bottlenecks in the data management lifecycle, especially when user intent is vague, evolving, or difficult to operationalize. The Pneuma Project introduces Pneuma-Seeker, a system that helps users articulate and fulfill information needs through iterative interaction with a language model-powered platform. The system reifies the user's evolving information need as a relational data model and incrementally converges toward a usable document aligned with that intent. To achieve this, the system combines three architectural ideas: context specialization to reduce LLM burden across subtasks, a conductor-style planner to assemble dynamic execution plans, and a convergence mechanism based on shared state. The system integrates recent advances in retrieval-augmented generation (RAG), agentic frameworks, and structured data preparation to support semi-automatic, language-guided workflows. We evaluate the system through LLM-based user simulations and show that it helps surface latent intent, guide discovery, and produce fit-for-purpose documents. It also acts as an emergent documentation layer, capturing institutional knowledge and supporting organizational memory.

翻译：数据发现与准备始终是数据管理生命周期中的瓶颈环节，尤其在用户意图模糊、动态变化或难以操作化时更为突出。Pneuma项目提出了Pneuma-Seeker系统，该系统通过用户与基于语言模型的平台进行迭代交互，帮助用户阐明并满足信息需求。该系统将用户动态变化的信息需求具象化为关系数据模型，并逐步收敛生成符合该意图的可用文档。为实现这一目标，该系统融合了三大架构理念：通过上下文专业化减轻大语言模型在子任务中的负担，采用指挥家式规划器组装动态执行计划，以及基于共享状态的收敛机制。该系统整合了检索增强生成、智能体框架和结构化数据准备等领域的最新进展，以支持半自动化的语言引导工作流。我们通过基于大语言模型的用户模拟对该系统进行评估，结果表明其能有效挖掘潜在意图、引导数据发现并生成符合用途的文档。该系统同时具备新兴文档层的功能，能够捕获机构知识并支持组织记忆。