Large-batch Contrastive Learning (CL), the foundation of modern representation learning, is fundamentally incompatible with the volatile resource constraints of edge devices. This conflict creates a dilemma: small on-device batches degrade model fidelity, while offloading to the cloud incurs unacceptable latency and bandwidth costs. Existing solutions often resort to static model compression, which fails to adapt to the runtime volatility of edge environments. To bridge this gap, we present StreamSplit, a novel framework that makes streaming CL practical across heterogeneous ARM client platforms. StreamSplit resolves the conflict between the continuous nature of ambient audio and the discrete batch requirements of models like CLAP and COLA. We introduce: (1) A distribution-based streaming framework that decouples representation quality from local batch size, using a tractable Hybrid Loss to maintain fidelity despite sparse updates; and (2) An Uncertainty-Guided Adaptive Splitter that uses a lightweight Reinforcement Learning (RL) policy to dynamically partition computation. Uniquely, this policy integrates real-time resource monitoring with embedding ambiguity to optimize the accuracy-latency trade-off on the fly. We evaluate StreamSplit on diverse hardware, from the resource-constrained Raspberry Pi 4 to the high-performance Apple M2. Results demonstrate that StreamSplit reduces per-sample latency by up to 4.7x and cuts bandwidth by 77.1% and energy by 52.3% compared to server-centric baselines. Crucially, it maintains accuracy within 2.2% of server-centric models, proving that adaptive, distributed learning is a viable path for the modern edge ecosystem.
翻译:大规模批量对比学习(CL)作为现代表示学习的基石,与边缘设备多变的资源约束存在根本性矛盾。这一矛盾导致两难困境:设备端的小批量处理会降低模型保真度,而将任务卸载至云端则会产生不可接受的延迟与带宽成本。现有方案通常采用静态模型压缩,无法适应边缘环境运行时的动态变化。为弥合这一鸿沟,我们提出StreamSplit这一新型框架,使流式对比学习在异构ARM客户端平台上成为可能。StreamSplit解决了环境音频的连续性与CLAP、COLA等模型对离散批处理需求之间的矛盾。我们引入:(1)基于分布的流式框架,通过可处理的混合损失函数将表示质量与本地批处理大小解耦,即便在稀疏更新条件下仍能保持保真度;(2)基于不确定性引导的自适应分割器,采用轻量级强化学习(RL)策略动态分配计算负载。该策略的独特之处在于将实时资源监测与嵌入歧义性相融合,以即时优化准确率-延迟权衡。我们在从资源受限的Raspberry Pi 4到高性能Apple M2的多样化硬件上评估了StreamSplit。结果表明,与以服务器为中心的基线方法相比,StreamSplit将每样本延迟降低达4.7倍,带宽减少77.1%,能耗降低52.3%。关键在于,其准确率仅比服务器端模型低2.2%,验证了自适应分布式学习是现代边缘生态系统中的可行路径。