Streaming systems are present throughout modern applications, processing continuous data in real-time. Existing streaming languages have a variety of semantic models and guarantees that are often incompatible. Yet all these languages are considered "streaming" -- what do they have in common? In this paper, we identify two general yet precise semantic properties: streaming progress and eager execution. Together, they ensure that streaming outputs are deterministic and kept fresh with respect to streaming inputs. We formally define these properties in the context of Flo, a parameterized streaming language that abstracts over dataflow operators and the underlying structure of streams. It leverages a lightweight type system to distinguish bounded streams, which allow operators to block on termination, from unbounded ones. Furthermore, Flo provides constructs for dataflow composition and nested graphs with cycles. To demonstrate the generality of our properties, we show how key ideas from representative streaming and incremental computation systems -- Flink, LVars, and DBSP -- have semantics that can be modeled in Flo and guarantees that map to our properties.
翻译:流处理系统广泛存在于现代应用中,实时处理连续数据。现有的流处理语言具有多种语义模型和保证,这些模型和保证往往互不兼容。然而所有这些语言都被认为是“流式”的——它们有何共同之处?本文中,我们提出了两个通用且精确的语义特性:流处理进度和急切执行。二者共同确保了流处理输出具有确定性,并能随流输入保持最新状态。我们在Flo的背景下正式定义了这些特性,Flo是一种参数化的流处理语言,它抽象了数据流算子和流的基础结构。它利用轻量级类型系统来区分有界流(允许算子阻塞直到终止)和无界流。此外,Flo提供了数据流组合和带循环的嵌套图构造。为了证明我们提出的特性的通用性,我们展示了代表性流处理和增量计算系统——Flink、LVars和DBSP——中的关键思想如何在Flo中建模其语义,并将其保证映射到我们的特性上。