Deep neural networks (DNNs) have become ubiquitous in addressing a number of problems, particularly in computer vision. However, DNN inference is computationally intensive, which can be prohibitive e.g. when considering edge devices. To solve this problem, a popular solution is DNN pruning, and more so structured pruning, where coherent computational blocks (e.g. channels for convolutional networks) are removed: as an exhaustive search of the space of pruned sub-models is intractable in practice, channels are typically removed iteratively based on an importance estimation heuristic. Recently, promising latency-aware pruning methods were proposed, where channels are removed until the network reaches a target budget of wall-clock latency pre-emptively estimated on specific hardware. In this paper, we present Archtree, a novel method for latency-driven structured pruning of DNNs. Archtree explores multiple candidate pruned sub-models in parallel in a tree-like fashion, allowing for a better exploration of the search space. Furthermore, it involves on-the-fly latency estimation on the target hardware, accounting for closer latencies as compared to the specified budget. Empirical results on several DNN architectures and target hardware show that Archtree better preserves the original model accuracy while better fitting the latency budget as compared to existing state-of-the-art methods.
翻译:深度神经网络(DNN)在解决众多问题中已变得不可或缺,尤其在计算机视觉领域。然而,DNN推理计算密集,例如在边缘设备上可能受到限制。为解决此问题,一种流行方案是DNN剪枝,尤其是结构化剪枝——移除连贯的计算模块(如卷积网络的通道)。由于对剪枝子模型空间的穷举搜索在实际中不可行,通道通常基于重要性估计启发式方法迭代移除。近期,有前景的延迟感知剪枝方法被提出:通道被逐步移除,直至网络的墙钟延迟达到提前在特定硬件上预估的目标预算。本文提出Archtree,一种新颖的延迟驱动结构化DNN剪枝方法。Archtree以树状方式并行探索多个候选剪枝子模型,从而实现对搜索空间更充分的探索。此外,它包含对目标硬件的即席延迟估计,使实际延迟更贴近指定预算。在多种DNN架构与目标硬件上的实验结果表明:相较于现有最优方法,Archtree能更有效保留原始模型精度,同时更好地适配延迟预算。