A large class of data questions can be modeled as identifying important slices of data driven by user defined metrics. This paper presents TRACE, a Time-Relational Approximate Cubing Engine that enables interactive analysis on such slices with a low upfront cost - both in space and computation. It does this by materializing the most important parts of the cube over time enabling interactive querying for a large class of analytical queries e.g. what part of my business has the highest revenue growth ([SubCategory=Sports Equipment, Gender=Female]), what slices are lagging in revenue per user ([State=CA, Age=20-30]). Many user defined metrics are supported including common aggregations such as SUM, COUNT, DISTINCT COUNT and more complex ones such as AVERAGE. We implemented and deployed TRACE for a variety of business use cases.
翻译:大量数据问题可建模为通过用户定义指标识别重要数据切片。本文提出TRACE,一种时间关系近似多维引擎,能够以低廉的初始成本(包括空间与计算开销)对这些切片进行交互式分析。该引擎通过物化多维数据集中随时间变化的最重要部分,支持对大规模分析查询的交互式响应,例如:业务中哪部分营收增长最高([子类目=体育器材,性别=女性])、哪些切片在单位用户收入上表现滞后([州=加利福尼亚,年龄=20-30])。其支持包括SUM、COUNT、DISTINCT COUNT等常见聚合函数及AVERAGE等更复杂指标在内的多种用户定义度量。我们已在多个业务场景中实现并部署了TRACE。