Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.
翻译:神经网络剪枝是降低深度神经网络规模与复杂度的关键技术,使得大规模模型能够在资源受限设备上部署。然而,现有剪枝方法严重依赖训练数据来指导剪枝策略,这使其无法有效应用于基于分布式及隐私数据集的联邦学习场景。此外,联邦学习中内存与计算密集型的剪枝过程对资源受限设备而言并不可行。针对上述挑战,我们提出FedTiny——一种面向联邦学习的分布式剪枝框架,可为内存与计算受限设备生成专用微型模型。该框架包含两个核心模块:自适应搜索粗粒度与细粒度剪枝专用模型,以适应稀疏且低成本的本地计算部署场景。首先,设计自适应批归一化选择模块以缓解由本地数据异质性引起的剪枝偏差。其次,提出轻量级渐进式剪枝模块,在严格内存与计算预算下实现模型细粒度剪枝,允许逐层逐步确定剪枝策略而非直接评估整体模型结构。实验结果表明,FedTiny在将深度模型压缩至极度稀疏的微型模型时,其性能优于现有最优方法:相较于最先进方法,在实现95.91%计算成本削减与94.01%内存占用降低的同时,精度提升达2.61%。