Structured Query Language (SQL) has remained the standard query language for databases. SQL is highly optimized for processing structured data laid out in relations. Meanwhile, in the present application development landscape, it is highly desirable to utilize the power of learned models to perform complex tasks. Large language models (LLMs) have been shown to understand and extract information from unstructured textual data. However, SQL as a query language and accompanying relational database systems are either incompatible or inefficient for workloads that require leveraging learned models. This results in complex engineering and multiple data migration operations that move data between the data sources and the model inference platform. In this paper, we present iPDB, a relational system that supports in-database machine learning (ML) and large language model (LLM) inferencing using extended SQL syntax. In iPDB, LLMs and ML calls can function as semantic projects, as predicates to perform semantic selects and semantic joins, or for semantic aggregations in group-by clauses. iPDB has a new relational predict operator along with semantic query optimizations that enable users to write and efficiently execute semantic SQL queries, outperforming other state-of-the-art systems by 2.5x mean speedup, with speedups of up to 30x.
翻译:结构化查询语言(SQL)一直是数据库的标准查询语言,针对关系型结构化数据的处理进行了高度优化。然而,在当前的应用开发环境中,利用学习模型执行复杂任务的需求日益迫切。大型语言模型(LLM)已被证明能够理解和提取非结构化文本数据中的信息。但SQL作为查询语言及其配套的关系数据库系统,在处理需要借助学习模型的工作负载时,存在不兼容或效率低下的问题。这导致复杂的工程实现和多次数据迁移操作,需要在数据源与模型推理平台之间来回移动数据。本文提出iPDB系统,一种支持数据库内机器学习(ML)和大型语言模型(LLM)推理的关系系统,通过扩展SQL语法实现。在iPDB中,LLM和ML调用可作为语义投影、语义选择和语义连接的谓词、或分组子句中的语义聚合函数。iPDB引入了新的关系预测算子,并结合语义查询优化技术,使用户能够编写并高效执行语义SQL查询,其平均执行速度比其他最先进系统快2.5倍,加速比最高可达30倍。