Edge computing enables data processing and storage closer to where the data are created. Given the largely distributed compute environment and the significantly dispersed data distribution, there are increasing demands of data sharing and collaborative processing on the edge. Since data shuffling can dominate the overall execution time of collaborative processing jobs, considering the limited power supply and bandwidth resource in edge environments, it is crucial and valuable to reduce the communication overhead across edge devices. Compared with data compression, compact data structures (CDS) seem to be more suitable in this case, for the capability of allowing data to be queried, navigated, and manipulated directly in a compact form. However, the relevant work about applying CDS to edge computing generally focuses on the intuitive benefit from reduced data size, while few discussions about the challenges are given, not to mention empirical investigations into real-world edge use cases. This research highlights the challenges, opportunities, and potential scenarios of CDS implementation in edge computing. Driven by the use case of shuffling-intensive data analytics, we proposed a three-layer architecture for CDS-aided data processing and particularly studied the feasibility and efficiency of the CDS layer. We expect this research to foster conjoint research efforts on CDS-aided edge data analytics and to make wider practical impacts.
翻译:边缘计算使得数据在产生源头附近进行处理和存储成为可能。鉴于高度分布式的计算环境和显著分散的数据分布,边缘端对数据共享与协同处理的需求日益增长。由于数据混洗可能主导协同处理作业的整体执行时间,考虑到边缘环境中有限的电力供应和带宽资源,减少跨边缘设备的通信开销至关重要且极具价值。与数据压缩相比,紧凑数据结构(CDS)在此场景下似乎更为适用,因其具备直接以紧凑形式对数据进行查询、导航和操作的能力。然而,关于将CDS应用于边缘计算的相关研究通常仅关注数据规模减小带来的直观收益,少有对相关挑战的探讨,更缺乏针对真实边缘用例的实证研究。本研究聚焦CDS在边缘计算中实现的挑战、机遇及潜在场景。受混洗密集型数据分析用例驱动,我们提出了一种支持CDS的三层数据处理架构,并重点研究了CDS层的可行性与效率。我们期望此项研究能够推动CDS辅助边缘数据分析的协同研究,并产生更广泛的实践影响。