Powering In-Database Dynamic Model Slicing for Structured Data Analytics

Relational database management systems (RDBMS) are widely used for the storage of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these subdatasets in a separate analytics system. The process can be prohibitively expensive, especially when there are various subdatasets extracted for different analytical purposes. This calls for efficient in-database support of advanced analytical methods. In this paper, we introduce LEADS, a novel SQL-aware dynamic model slicing technique to customize models for specified SQL queries. LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) and maintains efficiency by a SQL-aware gating network. At the core of LEADS is the construction of a general model with multiple expert sub-models trained over the database. The MoE scales up the modeling capacity, enhances effectiveness, and preserves efficiency by activating necessary experts via the SQL-aware gating network during inference. To support in-database analytics, we build an inference extension that integrates LEADS onto PostgreSQL. Our extensive experiments on real-world datasets demonstrate that LEADS consistently outperforms the baseline models, and the in-database inference extension delivers a considerable reduction in inference latency compared to traditional solutions.

翻译：关系数据库管理系统（RDBMS）被广泛用于结构化数据的存储。为了获得超越统计聚合的深层洞察，我们通常需要先通过传统的数据库操作从数据库中提取特定的子数据集，然后在独立的分析系统中对这些子数据集应用深度神经网络（DNN）进行训练和推理。这一过程可能代价高昂，尤其是在需要为不同分析目的提取多种子数据集时。这要求数据库能够高效地支持高级分析方法。本文介绍了LEADS，一种新颖的、具备SQL感知能力的动态模型切片技术，可为指定的SQL查询定制模型。LEADS通过专家混合模型（MoE）改进了结构化数据的预测建模能力，并借助一个SQL感知的门控网络来保持效率。LEADS的核心是构建一个包含多个专家子模型的通用模型，这些子模型在数据库上进行训练。MoE通过扩展建模容量来提升模型效能，并在推理时通过SQL感知的门控网络激活必要的专家，从而在保证效率的同时增强模型效果。为了支持数据库内分析，我们构建了一个推理扩展，将LEADS集成到PostgreSQL上。我们在真实世界数据集上进行的大量实验表明，LEADS始终优于基线模型，并且与传统的解决方案相比，数据库内推理扩展显著降低了推理延迟。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/