Edge applications generate a large influx of sensor data on massive scales, and these massive data streams must be processed shortly to derive actionable intelligence. However, traditional data processing systems are not well-suited for these edge applications as they often do not scale well with a large number of concurrent stream queries, do not support low-latency processing under limited edge computing resources, and do not adapt to the level of heterogeneity and dynamicity commonly present in edge computing environments. As such, we present AgileDart, an agile and scalable edge stream processing engine that enables fast stream processing of many concurrently running low-latency edge applications' queries at scale in dynamic, heterogeneous edge environments. The novelty of our work lies in a dynamic dataflow abstraction that leverages distributed hash table-based peer-to-peer overlay networks to autonomously place, chain, and scale stream operators to reduce query latencies, adapt to workload variations, and recover from failures and a bandit-based path planning model that re-plans the data shuffling paths to adapt to unreliable and heterogeneous edge networks. We show that AgileDart outperforms Storm and EdgeWise on query latency and significantly improves scalability and adaptability when processing many real-world edge stream applications' queries.
翻译:边缘应用在巨大规模上产生大量传感器数据,这些海量数据流必须被快速处理以获取可操作的智能信息。然而,传统数据处理系统并不适用于这些边缘应用,因为它们通常无法良好地扩展以支持大量并发流查询,无法在有限的边缘计算资源下支持低延迟处理,也无法适应边缘计算环境中普遍存在的异构性和动态性。为此,我们提出了AgileDart,一种灵活可扩展的边缘流处理引擎,能够在动态、异构的边缘环境中大规模快速处理许多并发运行的低延迟边缘应用查询。我们工作的新颖之处在于一种动态数据流抽象,它利用基于分布式哈希表的对等覆盖网络来自主放置、链接和扩展流处理算子,以降低查询延迟、适应工作负载变化并从故障中恢复;以及一种基于多臂老虎机的路径规划模型,该模型重新规划数据混洗路径以适应不可靠且异构的边缘网络。我们证明,在处理许多真实世界边缘流应用查询时,AgileDart在查询延迟方面优于Storm和EdgeWise,并显著提高了可扩展性和适应性。