Cloud and big data workloads are increasingly distributing data across multiple cloud providers and regions for rapid decision-making and analytics. Traditional transfer tools are typically specialized for a single paradigm, either stream replication or bulk transfer. This specialization forces users to deploy and manage separate systems with different configurations for each transfer pattern. This paper presents SkyHOST (Hybrid Object and Stream Transfer), a unified data movement architecture built upon the Skyplane framework to bridge the gap between bulk object transfer and streaming workloads through a single control plane and CLI. SkyHOST manages URI-based routing to automatically select the appropriate transfer mechanism, supporting both structured data for record-level ingestion and chunk-based transfer for large binary objects. We demonstrate, through an environmental monitoring use case and empirical evaluation, that SkyHOST provides operational simplicity by consolidating heterogeneous data movement patterns under a single control plane while achieving competitive throughput for cross-region transfers.
翻译:云和大数据工作负载日益将数据分布到多个云提供商和区域,以实现快速决策与分析。传统传输工具通常专精于单一范式——流复制或批量传输。这种专精化迫使用户针对每种传输模式部署和管理具有不同配置的独立系统。本文提出SkyHOST(混合对象与流传输),一种基于Skyplane框架构建的统一数据移动架构,通过单一控制平面和CLI桥接批量对象传输与流式工作负载之间的鸿沟。SkyHOST管理基于URI的路由,自动选择适当的传输机制,既支持面向记录级摄入的结构化数据,也支持面向大型二进制对象的块传输。通过环境监测用例和实证评估,我们证明SkyHOST通过将异构数据移动模式整合到单一控制平面下,在实现跨区域传输竞争性吞吐量的同时,提供了操作简便性。