Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

With the development of large language models (LLMs), numerous studies integrate LLMs through operator-like components to enhance relational data processing tasks, e.g., filters with semantic predicates, knowledge-augmented table imputation, reasoning-driven entity matching and more challenging semantic query processing. These components invoke LLMs while preserving a relational input/output interface, which we refer to as LLM-Enhanced Relational Operators (LROs). From an operator perspective, unfortunately, these existing LROs suffer from fragmented definition, various implementation strategies and inadequate evaluation benchmarks. To this end, in this paper, we first establish a unified LRO taxonomy to align existing LROs, and categorize them into: Select, Match, Impute, Cluster and Order, along with their operands and implementation variants. Second, we design LROBench, a comprehensive benchmark featuring 290 single-LRO queries and 60 multi-LRO queries, spanning 27 databases across more than 10 domains. LROBench covers all operating logics and operand granularities in its single-LRO workload, and provides challenging multi-LRO queries stratified by query complexity. Based on these, we evaluate individual LROs under various implementations, deriving practical insights into LRO design choices and summarizing our empirical best practices. We further compare the end-to-end performance of existing multi-LRO systems against an LRO suite instantiated with these best practices, in order to investigate how to design an effective LRO set for multi-LRO systems targeting complex semantic queries. Last, to facilitate future work, we outline promising future directions and open-source all benchmark data and evaluation code, available at https://github.com/LROBench/LROBench/.

翻译：随着大语言模型（LLMs）的发展，大量研究通过类算子组件集成LLMs以增强关系数据处理任务，例如具有语义谓词的过滤器、知识增强的表填补、推理驱动的实体匹配以及更具挑战性的语义查询处理。这些组件在调用LLMs的同时保持了关系型输入/输出接口，我们将其称为LLM增强关系算子（LROs）。然而，从算子视角看，现有LROs存在定义碎片化、实现策略多样化和评估基准不足的问题。为此，本文首先建立了统一的LRO分类体系以整合现有LROs，将其归类为：选择、匹配、填补、聚类和排序算子，并阐明其操作对象与实现变体。其次，我们设计了LROBench——一个涵盖10余个领域、跨越27个数据库的综合性基准测试集，包含290个单LRO查询和60个多LRO查询。LROBench的单LRO工作负载覆盖所有操作逻辑与操作对象粒度，并提供按查询复杂度分层的挑战性多LRO查询。基于此，我们评估了不同实现方案下的单个LRO性能，总结出LRO设计选择的实践启示与实证最佳实践。进一步地，我们将现有多LRO系统的端到端性能与采用这些最佳实践实例化的LRO套件进行对比，以探究如何为面向复杂语义查询的多LRO系统设计有效的LRO集合。最后，为促进后续研究，我们展望了未来发展方向，并开源了全部基准数据与评估代码（https://github.com/LROBench/LROBench/）。