A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

The Synergistic Collapse occurs when scaling beyond 100 agents causes superlinear performance degradation that individual optimizations cannot prevent. We observe this collapse with 150 cameras in Smart City deployment using MADDPG, where Deadline Satisfaction drops from 78% to 34%, producing approximately $180,000 in annual cost overruns. Prior work has addressed each contributing factor in isolation: exponential action-space growth, computational redundancy among spatially adjacent agents, and task-agnostic hardware scheduling. None has examined how these three factors interact and amplify each other. We present DAOEF (Delta-Aware Orchestration for Edge Federations), a framework that addresses all three simultaneously through: (1) Differential Neural Caching, which stores intermediate layer activations and computes only the input deltas, achieving 2.1x higher hit ratios (72% vs. 35%) than output-level caching while staying within 2% accuracy loss through empirically calibrated similarity thresholds; (2) Criticality-Based Action Space Pruning, which organizes agents into priority tiers and reduces coordination complexity from O(n2) to O(n log n) with less than 6% optimality loss; and (3) Learned Hardware Affinity Matching, which assigns tasks to their optimal accelerator (GPU, CPU, NPU, or FPGA) to prevent compounding mismatch penalties. Controlled factor-isolation experiments confirm that each mechanism is necessary but insufficient on its own: removing any single mechanism increases latency by more than 40%, validating that the gains are interdependent rather than additive. Across four datasets (100-250 agents) and a 20-device physical testbed, DAOEF achieves a 1.45x multiplicative gain over applying the three mechanisms independently. A 200-agent cloud deployment yields 62% latency reduction (280 ms vs. 735 ms), sub-linear latency growth up to 250 agents.

翻译：协同崩塌现象指当智能体数量超过100时，超线性性能退化无法通过单独优化避免。我们在采用MADDPG的智慧城市部署中观测到该现象：150台摄像头场景下，截止时间满足率从78%骤降至34%，年成本超支约18万美元。现有研究孤立解决各致因：指数级动作空间增长、空间相邻智能体间计算冗余、以及任务无关的硬件调度，但均未探讨三者如何交互并相互放大。本文提出DAOEF（增量感知边缘联邦编排框架），通过以下三项机制同步应对上述问题：(1) 差分神经缓存——仅存储中间层激活值并计算输入增量，经实证校准的相似度阈值使缓存命中率较输出级缓存提升2.1倍（72% vs 35%），且精度损失小于2%；(2) 关键度驱动的动作空间剪枝——将智能体分层优先级排序，协调复杂度从O(n²)降至O(n log n)，最优性损失低于6%；(3) 习得的硬件亲和匹配——将任务分配给最优加速器（GPU、CPU、NPU或FPGA），消除复合失配惩罚。受控的因子隔离实验证实各机制缺一不可：移除任意单一机制将导致延迟增加超40%，验证增益呈相互依赖而非叠加特性。在四个数据集（100-250智能体）及20设备物理测试平台上，DAOEF相较三种机制独立应用获得1.45倍乘数增益。200智能体云端部署实现62%延迟降低（280ms vs 735ms），且250智能体规模下仍保持亚线性延迟增长。