A range of data insight analytical tasks involves analyzing a large set of tables of different schemas, possibly induced by various groupings, to find salient patterns. In particular, such analyses are about many-to-many transformations of tables, while the classic relational algebra is about one-to-one or many-to-one transformations. This paper presents Multi-Relational Algebra, which extends relational algebra for such transformations and their compositions. Multi-Relational Algebra introduces MultiRelation to model of a set of tables of different schemas. Importantly, while the information unit in Relational Algebra is a tuple, the information unit in Multi-Relational Algebra is a slice, which formally is a pair $(r, X)$ where $r$ is a (region) tuple, and $X$ is a (feature) table. Multi-Relational Algebra introduces three new fundamental algebraic operators, MultiSelect, MultiProject, and MultiJoin, which lift their counterparts Select, Project, and Join to transform MultiRelation to MultiRelation. Through various examples, we show that multi-relational algebra can effortlessly express many complex analytic problems, some of which are traditionally considered out of scope for relational analytics. We have implemented and deployed a service for multi-relational analytics. Due to a unified logical design, we are able to conduct systematic optimization for a variety of seemingly different tasks. Our service has garnered interest from over a hundred internal teams who have developed data-insight applications using it, and serves millions of operators daily.
翻译:一系列数据洞察分析任务涉及分析大量不同模式的表格(可能由各种分组产生),以发现显著模式。这类分析尤其关注表格的多对多转换,而经典关系代数则处理一对一或多对一转换。本文提出多关系代数,扩展了关系代数以支持此类转换及其组合。多关系代数引入多关系(MultiRelation)来建模一组不同模式的表格。重要的是,关系代数中的信息单元是元组,而多关系代数中的信息单元是切片(slice),其形式化定义为二元组$(r, X)$,其中$r$是(区域)元组,$X$是(特征)表格。多关系代数引入了三个新的基本代数运算符:多选择(MultiSelect)、多投影(MultiProject)和多连接(MultiJoin),它们将对应的选择(Select)、投影(Project)和连接(Join)提升为对多关系的转换。通过多个示例,我们展示了多关系代数能够轻松表达许多复杂的分析问题,其中一些传统上被认为超出关系分析的范围。我们已经实现并部署了多关系分析服务。得益于统一的逻辑设计,我们能够对各种看似不同的任务进行系统化优化。该服务已获得上百个内部团队的关注,他们使用该服务开发了数据洞察应用,每日处理数百万次操作。