COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Large deep neural networks (DNNs), especially transformer-based and multimodal architectures, are computationally demanding and challenging to deploy on resource-constrained edge platforms like field robots. These challenges intensify in mission-critical scenarios (e.g., disaster response), where robots must collaborate under tight constraints on bandwidth, latency, and battery life, often without infrastructure or server support. To address these limitations, we present COHORT, a collaborative DNN inference and task-execution framework for multi-robot systems built on the Robotic Operating System (ROS). COHORT employs a hybrid offline-online reinforcement learning (RL) strategy to dynamically schedule and distribute DNN module execution across robots. Our key contributions are threefold: (a) Offline RL policy learning combined with Advantage-Weighted Regression (AWR), trained on auction-based task allocation data from heterogeneous DNN workloads across distributed robots, (b) Online policy adaptation via Multi-Agent PPO (MAPPO), initialized from the offline policy and fine-tuned in real time, and (c) comprehensive evaluation of COHORT on vision-language model (VLM) inference tasks such as CLIP and SAM, analyzing scalability with increasing robot/workload and robustness under . We benchmark COHORT against genetic algorithms and multiple RL baselines. Experimental results demonstrate that COHORT reduces battery consumption by 15.4% and increases GPU utilization by 51.67%, while satisfying frame-rate and deadline constraints 2.55 times of the time.

翻译：大规模深度神经网络（DNN），特别是基于Transformer的多模态架构，对计算资源要求极高，难以部署在野外机器人等资源受限的边缘平台上。在任务关键型场景（如灾害响应）中，这些挑战尤为突出：机器人必须在带宽、延迟和电池寿命的严格约束下进行协作，且通常缺乏基础设施或服务器支持。为应对这些限制，我们提出了COHORT——一个基于机器人操作系统（ROS）构建的、面向多机器人系统的协作式DNN推理与任务执行框架。COHORT采用混合离线-在线强化学习（RL）策略，动态调度并跨机器人分布式执行DNN模块。我们的核心贡献包括三个方面：（a）结合优势加权回归（AWR）的离线RL策略学习，该策略基于分布式机器人异构DNN工作负载的拍卖式任务分配数据进行训练；（b）通过多智能体近端策略优化（MAPPO）实现在线策略自适应，该策略以离线策略为初始化基础并进行实时微调；（c）在视觉语言模型（VLM）推理任务（如CLIP和SAM）上对COHORT进行全面评估，分析其随机器人数量/工作负载增加的可扩展性及在条件下的鲁棒性。我们将COHORT与遗传算法及多种RL基线方法进行对比实验。结果表明，COHORT在满足帧率与截止时间约束的成功率达到基准方法的2.55倍的同时，可降低15.4%的电池消耗并提升51.67%的GPU利用率。