In their hunt for highlights, i.e., interesting patterns in the data, data analysts have to issue groups of related queries and manually combine their results. To the extent that the analyst's goals are based on an intention on what to discover (e.g., contrast a query result to peer ones, verify a pattern to a broader range of data in the data space, etc), the integration of intentional query operators in analytical engines can enhance the efficiency of these analytical tasks. In this paper, we introduce, with well-defined semantics, the ANALYZE operator, a novel cube querying intentional operator that provides a 360 view of data. We define the semantics of an ANALYZE query as a tuple of five internal, facilitator cube queries, that (a) report on the specifics of a particular subset of the data space, which is part of the query specification, and to which we refer as the original query, (b) contrast the result with results from peer-subspaces, or sibling queries, and, (c) explore the data space in lower levels of granularity via drill-down queries. We introduce formal query semantics for the operator and we theoretically prove that we can obtain the exact same result by merging the facilitator cube queries into a smaller number of queries. This effectively introduces a multi-query optimization (MQO) strategy for executing an ANALYZE query. We propose three alternative algorithms, (a) a simple execution without optimizations (Min-MQO), (b) a total merging of all the facilitator queries to a single one (Max-MQO), and (c) an intermediate strategy, Mid-MQO, that merges only a subset of the facilitator queries. Our experimentation demonstrates that Mid-MQO achieves consistently strong performance across several contexts, Min-MQO always follows it, and Max-MQO excels for queries where the siblings are sizable and significantly overlap.
翻译:数据分析师在寻找数据中的亮点(即有趣模式)时,必须提交一组相关查询并手动合并其结果。只要分析师的目标基于某种发现意图(例如,将查询结果与同类结果进行对比、将模式验证到数据空间中更广泛的范围等),在分析引擎中集成意图性查询操作符就能提升这些分析任务的效率。本文通过严格定义的语义,引入了一种新颖的立方体查询意图操作符——ANALYZE操作符,该操作符能提供数据的全景视图。我们将ANALYZE查询的语义定义为由五个内部辅助立方体查询组成的元组,这些查询能够:(a) 报告数据空间中特定子集(作为查询规范的一部分,我们称之为原始查询)的详细信息;(b) 将结果与同类子空间(即兄弟查询)的结果进行对比;(c) 通过下钻查询在更细粒度层级探索数据空间。我们为该操作符建立了形式化查询语义,并从理论上证明,通过将辅助立方体查询合并为数量更少的查询,可以获得完全相同的结果。这实质上为执行ANALYZE查询引入了一种多查询优化(MQO)策略。我们提出了三种替代算法:(a) 无优化的简单执行方案(Min-MQO);(b) 将所有辅助查询完全合并为单个查询的方案(Max-MQO);(c) 仅合并部分辅助查询的折中策略Mid-MQO。实验表明,Mid-MQO在多种场景下均能保持稳定优异的性能,Min-MQO始终紧随其后,而Max-MQO在处理兄弟查询规模较大且重叠度显著的查询时表现卓越。