Modern information retrieval is transitioning from simple document filtering to complex, neuro-symbolic reasoning workflows. However, current retrieval architectures face a fundamental efficiency dilemma when handling the rigorous logical and arithmetic constraints required by this new paradigm. Standard iterator-based engines (Document-at-a-Time) do not natively support complex, nested logic graphs; forcing them to execute such queries typically results in intractable runtime performance. Conversely, naive recursive approaches (Term-at-a-Time), while capable of supporting these structures, suffer from prohibitive memory consumption when enforcing broad logical exclusions. In this paper, we propose that a retrieval engine must be capable of ``Capturing $\mathbf{P}$'' -- evaluating any polynomial-time property directly over its index in a computationally efficient manner. We define a formal Retrieval Language ($\mathcal{L}_R$) based on Directed Acyclic Graphs (DAGs) and prove it precisely captures the complexity class $\mathbf{P}$. We introduce \texttt{ComputePN}, a novel evaluation algorithm that makes $\mathcal{L}_R$ tractable. By combining native DAG traversal with a memory-efficient ``Positive-Negative'' response mechanism, \texttt{ComputePN} ensures the efficient evaluation of any query in $\mathcal{L}_R$. This work establishes the theoretical foundation for turning the search index into a general-purpose computational engine.
翻译:现代信息检索正从简单的文档过滤转向复杂的神经符号推理工作流。然而,当前检索架构在处理这一新范式所需的严格逻辑与算术约束时,面临根本性的效率困境。标准的基于迭代器的引擎(逐文档处理)本身不支持复杂的嵌套逻辑图;强制其执行此类查询通常会导致不可行的运行时性能。相反,朴素的递归方法(逐词项处理)虽然能够支持这些结构,但在执行广泛的逻辑排除时会产生难以承受的内存消耗。本文提出,检索引擎必须具备“捕获$\mathbf{P}$”的能力——即能够以计算高效的方式直接在其索引上评估任何多项式时间可判定的性质。我们基于有向无环图定义了一种形式化检索语言($\mathcal{L}_R$),并证明其精确刻画了复杂度类$\mathbf{P}$。我们提出了\texttt{ComputePN}这一新颖的评估算法,使$\mathcal{L}_R$可高效计算。通过将原生DAG遍历与内存高效的“正-负”响应机制相结合,\texttt{ComputePN}确保了$\mathcal{L}_R$中任意查询的高效评估。此项工作为将搜索索引转化为通用计算引擎奠定了理论基础。