In large-scale UAV swarms, dynamically executing machine learning tasks can pose significant challenges due to network volatility and the heterogeneous resource constraints of each UAV. Traditional approaches often rely on centralized orchestration to partition tasks among nodes. However, these methods struggle with communication bottlenecks, latency, and reliability when the swarm grows or the topology shifts rapidly. To overcome these limitations, we propose a fully distributed, diffusive metric-based approach for split computing in UAV swarms. Our solution introduces a new iterative measure, termed the aggregated gigaflops, capturing each node's own computing capacity along with that of its neighbors without requiring global network knowledge. By forwarding partial inferences intelligently to underutilized nodes, we achieve improved task throughput, lower latency, and enhanced energy efficiency. Further, to handle sudden workload surges and rapidly changing node conditions, we incorporate an early-exit mechanism that can adapt the inference pathway on-the-fly. Extensive simulations demonstrate that our approach significantly outperforms baseline strategies across multiple performance indices, including latency, fairness, and energy consumption. These results highlight the feasibility of large-scale distributed intelligence in UAV swarms and provide a blueprint for deploying robust, scalable ML services in diverse aerial networks.
翻译:在大规模无人机集群中,由于网络波动和每架无人机异构的资源限制,动态执行机器学习任务会带来巨大挑战。传统方法通常依赖集中式编排在节点间分配任务。然而,当集群规模扩大或拓扑结构快速变化时,这些方法会遭遇通信瓶颈、延迟和可靠性问题。为克服这些限制,我们提出了一种基于扩散度量的全分布式分割计算方法。该方案引入了一种新的迭代度量——聚合千兆浮点运算次数,无需全局网络知识即可捕获每个节点及其邻居的计算能力。通过将部分推理智能地转发给未充分利用的节点,我们实现了任务吞吐量提升、延迟降低和能效增强。此外,为应对突发工作负载激增和节点状态快速变化,我们集成了一个可动态调整推理路径的早期退出机制。大量仿真表明,我们的方法在延迟、公平性和能耗等多个性能指标上显著优于基线策略。这些结果验证了无人机集群大规模分布式智能的可行性,并为在多样化空中网络中部署鲁棒、可扩展的机器学习服务提供了蓝图。