Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning system. The process can be prohibitively expensive, especially when there are a combinatorial number of subdatasets extracted for different analytical purposes. This calls for efficient in-database support of advanced analytical methods In this paper, we introduce LEADS, a novel SQL-aware dynamic model slicing technique to customize models for subdatasets specified by SQL queries. LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) technique and maintains inference efficiency by a SQL-aware gating network. At the core of LEADS is the construction of a general model with multiple expert sub-models via MoE trained over the entire database. This SQL-aware MoE technique scales up the modeling capacity, enhances effectiveness, and preserves efficiency by activating only necessary experts via the gating network during inference. Additionally, we introduce two regularization terms during the training process of LEADS to strike a balance between effectiveness and efficiency. We also design and build an in-database inference system, called INDICES, to support end-to-end advanced structured data analytics by non-intrusively incorporating LEADS onto PostgreSQL. Our extensive experiments on real-world datasets demonstrate that LEADS consistently outperforms baseline models, and INDICES delivers effective in-database analytics with a considerable reduction in inference latency compared to traditional solutions.
翻译:关系型数据库管理系统(RDBMS)广泛用于结构化数据的存储与检索。为获取超越统计聚合的洞察,我们通常需通过传统数据库操作提取特定子数据集,并在独立的机器学习系统中对这些子数据集分别进行深度神经网络(DNN)训练与推理。当因不同分析目的提取组合海量子数据集时,该过程成本极高。这亟需在数据库内高效支持高级分析方法。本文提出LEADS——一种新颖的SQL感知动态模型切片技术,可为SQL查询指定的子数据集定制化模型。LEADS通过混合专家(MoE)技术改进结构化数据预测建模,并利用SQL感知门控网络维持推理效率。其核心在于通过在全数据库上训练的MoE构建包含多个专家子模型的通用模型。该SQL感知MoE技术通过推理时仅激活必要专家来扩展建模能力、提升有效性并保持效率。此外,我们在LEADS训练过程中引入两项正则化项以平衡有效性与效率。我们设计并构建了名为INDICES的数据库内推理系统,通过将LEADS非侵入式集成至PostgreSQL,支持端到端的高级结构化数据分析。在真实数据集上的广泛实验表明,LEADS持续优于基线模型,且INDICES相比传统方案不仅实现了有效的数据库内分析,还显著降低了推理延迟。