OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

from arxiv, 39 pages. Due to the page dim limitation, the full appendix is attached here https://tinyurl.com/otov3appendix. Recommend to zoom-in for finer details. arXiv admin note: text overlap with arXiv:2305.18030

Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that demand substantial engineering and domain knowledge, limiting their broader applications. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations, creating a compact and competitive sub-network without the need of fine-tuning. OTOv3 simplifies and automates the training and compression process, minimizes the engineering efforts required from users. It offers key technological advancements: (i) automatic search space construction for general DNNs based on dependency graph analysis; (ii) Dual Half-Space Projected Gradient (DHSPG) and its enhanced version with hierarchical search (H2SPG) to reliably solve (hierarchical) structured sparsity problems and ensure sub-network validity; and (iii) automated sub-network construction using solutions from DHSPG/H2SPG and dependency graphs. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured pruning and neural architecture search. OTOv3 produces sub-networks that match or exceed the state-of-the-arts. The source code will be available at https://github.com/tianyic/only_train_once.

翻译：将预定义的深度神经网络（DNN）压缩为具有竞争力的紧凑子网络，是高效机器学习领域的关键课题。该主题涵盖从结构化剪枝到神经架构搜索的多种技术，涉及剪枝与算子擦除两大视角。尽管已有进展，现有方法仍依赖复杂多阶段流程，需要大量工程经验与领域知识，制约了其广泛应用。我们提出第三代"仅需一次训练"方法（OTOv3），其首次通过剪枝与擦除操作自动训练并压缩通用DNN，无需微调即可获得紧凑且性能优异的子网络。OTOv3简化并自动化了训练与压缩流程，大幅降低了用户的工程投入。其核心技术突破包括：(i) 基于依赖图分析自动构建通用DNN搜索空间；(ii) 双半空间投影梯度法（DHSPG）及其分层搜索增强版本（H2SPG），可靠求解（分层）结构化稀疏问题并确保子网络有效性；(iii) 利用DHSPG/H2SPG解与依赖图自动构建子网络。实验结果表明，OTOv3在结构化剪枝与神经架构搜索的多项基准测试中表现优异，生成的子网络性能可媲美甚至超越现有最优方法。源代码将发布于https://github.com/tianyic/only_train_once。