Prune-Deprune: Adaptive Compression-Aware Split Learning and Inference for Enhanced Network Efficiency

The growing number of AI-driven applications in mobile devices has led to solutions that integrate deep learning models with the available edge-cloud resources. Due to multiple benefits such as reduction in on-device energy consumption, improved latency, improved network usage, and certain privacy improvements, split learning, where deep learning models are split away from the mobile device and computed in a distributed manner, has become an extensively explored topic. Incorporating compression-aware methods (where learning adapts to compression level of the communicated data) has made split learning even more advantageous. This method could even offer a viable alternative to traditional methods, such as federated learning techniques. In this work, we develop an adaptive compression-aware split learning method ('deprune') to improve and train deep learning models so that they are much more network-efficient, which would make them ideal to deploy in weaker devices with the help of edge-cloud resources. This method is also extended ('prune') to very quickly train deep learning models through a transfer learning approach, which trades off little accuracy for much more network-efficient inference abilities. We show that the 'deprune' method can reduce network usage by 4x when compared with a split-learning approach (that does not use our method) without loss of accuracy, while also improving accuracy over compression-aware split-learning by 4 percent. Lastly, we show that the 'prune' method can reduce the training time for certain models by up to 6x without affecting the accuracy when compared against a compression-aware split-learning approach.

翻译：随着移动设备中人工智能驱动应用数量的增长，催生了将深度学习模型与可用边缘-云资源集成的解决方案。由于具有降低设备端能耗、改善延迟、优化网络使用以及提升隐私保护等多重优势，将深度学习模型从移动设备分割并以分布式方式计算的"分割学习"已成为广泛探索的研究领域。通过引入压缩感知方法（即学习过程自适应调整通信数据的压缩级别），分割学习的优势进一步得到增强，甚至可为联邦学习等传统方法提供可行替代方案。本研究开发了一种自适应压缩感知分割学习方法（"deprune"），用于改进和训练深度学习模型，使其具备更高的网络效率——这使其在边缘-云资源辅助下成为部署于低性能设备的理想选择。该方法的扩展版本（"prune"）通过迁移学习策略实现深度学习模型的极速训练，仅牺牲少量精度即可获得网络效率显著提升的推理能力。实验表明：与未采用本方法的分割学习相比，"deprune"方法可在保持精度不变的情况下将网络使用量降低4倍，相较压缩感知分割学习方案更可实现4%的精度提升。此外，"prune"方法在对比压缩感知分割学习时，可将特定模型的训练时间缩短高达6倍且不影响精度。