In large-scale UAV swarms, dynamically executing machine learning tasks can pose significant challenges due to network volatility and the heterogeneous resource constraints of each UAV. Traditional approaches often rely on centralized orchestration to partition tasks among nodes. However, these methods struggle with communication bottlenecks, latency, and reliability when the swarm grows or the topology shifts rapidly. To overcome these limitations, we propose a fully distributed, diffusive metric-based approach for split computing in UAV swarms. Our solution introduces a new iterative measure, termed the aggregated gigaflops, capturing each node's own computing capacity along with that of its neighbors without requiring global network knowledge. By forwarding partial inferences intelligently to underutilized nodes, we achieve improved task throughput, lower latency, and enhanced energy efficiency. Further, to handle sudden workload surges and rapidly changing node conditions, we incorporate an early-exit mechanism that can adapt the inference pathway on-the-fly. Extensive simulations demonstrate that our approach significantly outperforms baseline strategies across multiple performance indices, including latency, fairness, and energy consumption. These results highlight the feasibility of large-scale distributed intelligence in UAV swarms and provide a blueprint for deploying robust, scalable ML services in diverse aerial networks.
翻译:在大规模无人机集群中,动态执行机器学习任务会因网络波动性及各无人机异构的资源约束而带来重大挑战。传统方法通常依赖集中式编排在节点间划分任务。然而,当集群规模扩大或拓扑结构快速变化时,这些方法难以应对通信瓶颈、延迟和可靠性问题。为克服这些限制,我们提出了一种完全分布式、基于扩散度量的方法,用于无人机集群中的分割计算。我们的解决方案引入了一种新的迭代度量,称为聚合千兆浮点运算能力,该度量能在无需全局网络知识的情况下,捕获节点自身及其邻居的计算能力。通过将部分推理任务智能地转发至利用率不足的节点,我们实现了更高的任务吞吐量、更低的延迟以及更强的能效。此外,为应对突发的工作负载激增和快速变化的节点状态,我们引入了一种早期退出机制,能够动态调整推理路径。大量仿真实验表明,我们的方法在延迟、公平性和能耗等多个性能指标上均显著优于基线策略。这些结果凸显了在大规模无人机集群中实现分布式智能的可行性,并为在各种空中网络中部署鲁棒、可扩展的机器学习服务提供了蓝图。