Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

With the development of large language models (LLMs), numerous studies integrate LLMs through operator-like components to enhance relational data processing tasks, e.g., filters with semantic predicates, knowledge-augmented table imputation, reasoning-driven entity matching and more challenging semantic query processing. These components invoke LLMs while preserving a relational input/output interface, which we refer to as LLM-Enhanced Relational Operators (LROs). From an operator perspective, unfortunately, these existing LROs suffer from fragmented definition, various implementation strategies and inadequate evaluation benchmarks. To this end, in this paper, we first establish a unified LRO taxonomy to align existing LROs, and categorize them into: Select, Match, Impute, Cluster and Order, along with their operands and implementation variants. Second, we design LROBench, a comprehensive benchmark featuring 290 single-LRO queries and 60 multi-LRO queries, spanning 27 databases across more than 10 domains. LROBench covers all operating logics and operand granularities in its single-LRO workload, and provides challenging multi-LRO queries stratified by query complexity. Based on these, we evaluate individual LROs under various implementations, deriving practical insights into LRO design choices and summarizing our empirical best practices. We further compare the end-to-end performance of existing multi-LRO systems against an LRO suite instantiated with these best practices, in order to investigate how to design an effective LRO set for multi-LRO systems targeting complex semantic queries. Last, to facilitate future work, we outline promising future directions and open-source all benchmark data and evaluation code, available at https://github.com/LROBench/LROBench/.

翻译：随着大语言模型（LLMs）的发展，大量研究通过类算子组件将LLMs集成到关系数据处理任务中，例如带语义谓词的筛选、知识增强的表填充、推理驱动的实体匹配以及更具挑战性的语义查询处理。这些组件在调用LLMs的同时保持关系输入/输出接口，我们将其称为大语言模型增强的关系算子（LROs）。然而，从算子视角看，现有LROs存在定义分散、实现策略多样以及评估基准不充分等问题。为此，本文首先建立统一的LRO分类体系以对齐现有LROs，将其分为：选择（Select）、匹配（Match）、填充（Impute）、聚类（Cluster）和排序（Order）五类，并明确其操作数和实现变体。其次，我们设计了LROBench——涵盖290个单LRO查询和60个多LRO查询的综合基准测试，覆盖27个数据库（横跨10余个领域）。该基准在单LRO工作负载中覆盖所有操作逻辑和操作数粒度，并按查询复杂度分层提供具有挑战性的多LRO查询。基于此，我们在多种实现方案下评估单个LRO，得出LRO设计选择的实践洞见并总结经验性最佳实践。我们进一步将现有多LRO系统的端到端性能与基于这些最佳实践实例化的LRO套件进行对比，以探究如何为面向复杂语义查询的多LRO系统设计有效的LRO集合。最后，为促进未来研究，我们概述了有前景的研究方向，并开源所有基准数据和评估代码（访问链接：https://github.com/LROBench/LROBench/）。