Dashboards are vital in modern business intelligence tools, providing non-technical users with an interface to access comprehensive business data. With the rise of cloud technology, there is an increased number of data sources to provide enriched contexts for various analytical tasks, leading to a demand for interactive dashboards over a large number of joins. Nevertheless, joins are among the most expensive operations in DBMSes, making the support of interactive dashboards over joins challenging. In this paper, we present Treant, a dashboard accelerator for queries over large joins. Treant uses factorized query execution to handle aggregation queries over large joins, which alone is still insufficient for interactive speeds. To address this, we exploit the incremental nature of user interactions using Calibrated Junction Hypertree (CJT), a novel data structure that applies lightweight materialization of the intermediates during factorized execution. CJT ensures that the work needed to compute a query is proportional to how different it is from the previous query, rather than the overall complexity. Treant manages CJTs to share work between queries and performs materialization offline or during user "think-times." Implemented as a middleware that rewrites SQL, Treant is portable to any SQL-based DBMS. Our experiments on single node and cloud DBMSes show that Treant improves dashboard interactions by two orders of magnitude, and provides 10x improvement for ML augmentation compared to SOTA factorized ML system.
翻译:仪表盘是现代商业智能工具的核心组件,为非技术用户提供访问综合业务数据的接口。随着云计算技术的兴起,为各类分析任务提供丰富上下文的数据源数量激增,这使得用户对涉及大量连接查询的交互式仪表盘产生了强烈需求。然而,连接操作是数据库管理系统中最昂贵的操作之一,这为支持基于连接查询的交互式仪表盘带来了严峻挑战。本文提出Treant——一种面向大规模连接查询的仪表盘加速器。Treant采用因式分解查询执行技术处理大规模连接上的聚合查询,但仅此仍不足以达到交互式响应速度。为此,我们利用用户交互的增量特性,提出校准连接超树(Calibrated Junction Hypertree, CJT)这一新型数据结构,在因式分解执行过程中对中间结果进行轻量级物化。CJT保证查询计算所需的工作量与当前查询和前一查询的差异程度成正比,而非取决于整体复杂度。Treant通过管理CJT实现查询间的工作共享,并支持离线或在用户"思考时间"内执行物化操作。作为一款通过重写SQL实现的数据中间件,Treant可移植至任何基于SQL的数据库管理系统。我们在单节点和云数据库管理系统上的实验表明,Treant可将仪表盘交互性能提升两个数量级,与现有先进因式分解机器学习系统相比,其对机器学习增强的改进幅度达10倍。