DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinatewise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsityinduced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box. Codes are available at https://github.com/OPTML-Group/DeepZero.

翻译：零阶优化已成为一种流行的机器学习问题求解技术，尤其适用于一阶信息难以获取或无法获取的场景。然而，零阶优化的可扩展性仍是待解难题：其应用主要局限于样本级对抗攻击生成等小规模机器学习问题。据我们所知，尚无前期工作证明零阶优化能在不显著降低性能的情况下有效训练深度神经网络。为突破这一瓶颈，我们提出DeepZero——一个原则性的零阶深度学习框架，通过三项核心创新实现从零开始扩展零阶优化至深度神经网络训练。首先，我们证明在训练精度与计算效率方面，坐标梯度估计优于随机向量梯度估计。第二，我们提出基于稀疏性的零阶训练协议，仅利用有限差分扩展模型剪枝方法，以探索并利用坐标梯度估计中的稀疏深度学习先验。第三，我们开发特征重用与前向并行化方法，推动零阶训练的实际部署。大量实验表明，DeepZero在CIFAR-10数据集上训练的ResNet-20模型达到最先进精度，首次接近一阶训练性能。此外，在认证对抗防御与基于深度学习的偏微分方程误差校正等应用中，DeepZero的实用价值得以验证，相较现有最先进方法提升10-20%的性能。我们相信本研究成果将启迪可扩展零阶优化的未来方向，并为黑箱深度学习的进步作出贡献。代码开源地址：https://github.com/OPTML-Group/DeepZero。