OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning

Resolving complex information needs that come with multiple constraints should consider enforcing the logical operators encoded in the query (i.e., conjunction, disjunction, negation) on the candidate answer set. Current retrieval systems either ignore these constraints in neural embeddings or approximate them in a generative reasoning process that can be inconsistent and unreliable. Although well-suited to structured reasoning, existing neuro-symbolic approaches remain confined to formal logic or mathematics problems as they often assume unambiguous queries and access to complete evidence, conditions rarely met in information retrieval. To bridge this gap, we introduce OrLog, a neuro-symbolic retrieval framework that decouples predicate-level plausibility estimation from logical reasoning: a large language model (LLM) provides plausibility scores for atomic predicates in one decoding-free forward pass, from which a probabilistic reasoning engine derives the posterior probability of query satisfaction. We evaluate OrLog across multiple backbone LLMs, varying levels of access to external knowledge, and a range of logical constraints, and compare it against base retrievers and LLM-as-reasoner methods. Provided with entity descriptions, OrLog can significantly boost top-rank precision compared to LLM reasoning with larger gains on disjunctive queries. OrLog is also more efficient, cutting mean tokens by $\sim$90\% per query-entity pair. These results demonstrate that generation-free predicate plausibility estimation combined with probabilistic reasoning enables constraint-aware retrieval that outperforms monolithic reasoning while using far fewer tokens.

翻译：解决伴随多重约束的复杂信息需求时，应考虑对候选答案集执行查询中编码的逻辑运算符（即合取、析取、否定）。当前检索系统要么在神经嵌入中忽略这些约束，要么在生成式推理过程中进行近似处理，而这种方式可能不一致且不可靠。尽管现有神经符号方法非常适合结构化推理，但由于通常假设查询无歧义且能获取完整证据（这些条件在信息检索中很少满足），其仍局限于形式逻辑或数学问题。为弥合这一差距，我们提出了OrLog——一种将谓词级合理性评估与逻辑推理解耦的神经符号检索框架：大型语言模型（LLM）通过一次无解码前向传递为原子谓词提供合理性分数，概率推理引擎据此推导出查询满足度的后验概率。我们在多种骨干LLM、不同外部知识访问层级及一系列逻辑约束下评估OrLog，并将其与基础检索器及LLM即推理器方法进行比较。在获得实体描述的情况下，与LLM推理相比，OrLog能显著提升Top-rank精度，且在析取查询上提升幅度更大。OrLog还具有更高效率，将每个查询-实体对的平均令牌数降低约90%。这些结果表明：免生成的谓词合理性估计与概率推理相结合，能够实现优于整体推理的约束感知检索，同时消耗的令牌数大幅减少。