Databases, and datasets more generally, evolve continuously through updates, transformations, versioning, schema changes, streaming operations, and other mechanisms. While prior work has noted connections among some of these areas, they have traditionally been studied in isolation, each with its own abstractions, algorithms, and system implementations. In this paper, we argue for unifying these diverse functionalities under a single abstraction and a common set of computational primitives. We present such an abstraction, powerful enough to encompass existing use cases and to support new ones. Going beyond previous approaches, our framework seamlessly integrates provenance tracking for system-visible operations, conditional propagation of updates, and configurable alerts on change events. It also offers a principled treatment of dependent objects such as views and derived artifacts like machine learning models, by providing declarative mechanisms to control their evolution. Finally, we sketch a prototype implementation in a relational-like database system based on an adaptation of the "Prolly Tree", a Merkle tree-inspired data structure with tunable parameters to meet varying performance requirements, and present some initial experimental results.
翻译:数据库及更广义的数据集通过更新、转换、版本控制、模式变更、流式操作及其他机制持续演化。尽管已有研究注意到其中某些方面存在关联,但传统上它们被孤立地研究,各自拥有独立的抽象、算法与系统实现。本文主张将这些多样化功能统一于单一抽象和共同计算原语集合之下。我们提出了一种足够强大的抽象,既能涵盖现有用例,也能支持新型场景。与以往方法相比,我们的框架无缝整合了系统可见操作的血缘追踪、更新的条件传播以及变更事件的可配置告警。同时,该框架通过提供声明式机制来控制依赖对象(如视图)及衍生制品(如机器学习模型)的演化,实现了对二者的规范化处理。最后,我们基于“Prolly Tree”(一种受默克尔树启发的数据结构,其参数可调以适应不同性能需求)的适配方案,在类关系型数据库系统中勾勒出原型实现,并给出了初步实验结果。