Next-generation IoT applications increasingly span across autonomous administrative entities, necessitating silo-cooperative scheduling to leverage diverse computational resources while preserving data privacy. However, realizing efficient cooperation faces significant challenges arising from infrastructure heterogeneity, Non-IID workload shifts, and the inherent risks of adversarial environments. Existing approaches, relying predominantly on centralized coordination or independent learning, fail to address the incompatibility of state-action spaces across heterogeneous silos and lack robustness against malicious attacks. This paper proposes DeFRiS, a Decentralized Federated Reinforcement Learning framework for robust and scalable Silo-cooperative IoT application scheduling. DeFRiS integrates three synergistic innovations: (i) an action-space-agnostic policy utilizing candidate resource scoring to enable seamless knowledge transfer across heterogeneous silos; (ii) a silo-optimized local learning mechanism combining Generalized Advantage Estimation (GAE) with clipped policy updates to resolve sparse delayed reward challenges; and (iii) a Dual-Track Non-IID robust decentralized aggregation protocol leveraging gradient fingerprints for similarity-aware knowledge transfer and anomaly detection, and gradient tracking for optimization momentum. Extensive experiments on a distributed testbed with 20 heterogeneous silos and realistic IoT workloads demonstrate that DeFRiS significantly outperforms state-of-the-art baselines, reducing average response time by 6.4% and energy consumption by 7.2%, while lowering tail latency risk (CVaR$_{0.95}$) by 10.4% and achieving near-zero deadline violations. Furthermore, DeFRiS achieves over 3 times better performance retention as the system scales and over 8 times better stability in adversarial environments compared to the best-performing baseline.
翻译:下一代物联网应用日益跨越自治的管理实体,这需要通过孤岛协同调度来利用多样化的计算资源,同时保护数据隐私。然而,实现高效协同面临着基础设施异构性、非独立同分布的工作负载转移以及对抗性环境固有风险带来的重大挑战。现有方法主要依赖集中式协调或独立学习,未能解决异构孤岛间状态-动作空间的不兼容性问题,并且缺乏对抗恶意攻击的鲁棒性。本文提出DeFRiS,一个用于鲁棒且可扩展的孤岛协同物联网应用调度的去中心化联邦强化学习框架。DeFRiS集成了三项协同创新:(i) 一种动作空间无关的策略,利用候选资源评分实现跨异构孤岛的无缝知识迁移;(ii) 一种孤岛优化的本地学习机制,将广义优势估计与裁剪策略更新相结合,以解决稀疏延迟奖励的挑战;(iii) 一种双轨非独立同分布鲁棒去中心化聚合协议,利用梯度指纹进行相似性感知的知识迁移和异常检测,并利用梯度追踪来维持优化动量。在包含20个异构孤岛和真实物联网工作负载的分布式测试平台上进行的广泛实验表明,DeFRiS显著优于最先进的基线方法,将平均响应时间降低了6.4%,能耗降低了7.2%,同时将尾部延迟风险(CVaR$_{0.95}$)降低了10.4%,并实现了接近零的截止时间违规。此外,随着系统规模扩大,DeFRiS实现了超过3倍的性能保持能力;在对抗性环境中,与性能最佳的基线相比,其稳定性提高了超过8倍。