While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality.
翻译:尽管机器学习传统上是一项资源密集型任务,但嵌入式系统、自主导航以及物联网的愿景推动了对资源高效方法的兴趣。这些方法旨在性能和资源消耗(计算与能耗)之间审慎权衡。此类方法的开发是当前机器学习研究中的主要挑战之一,也是确保机器学习技术从拥有几乎无限计算资源的科研环境平滑过渡到日常应用的关键。本文概述了满足这些实际需求的最先进的机器学习技术现状。我们特别关注基于深度神经网络(DNN)的资源高效推理——DNN是过去十年中占据主导地位的机器学习模型。我们全面综述了庞大的相关文献,这些文献主要分为三类(非互斥):(i)量化神经网络,(ii)网络剪枝,以及(iii)结构效率。这些技术可在训练阶段或作为后处理步骤应用,广泛用于降低内存占用、推理速度和能效方面的计算需求。我们还简要讨论了DNN嵌入式硬件的不同概念及其与机器学习技术的兼容性,以及降低能耗和延迟的潜力。我们通过使用压缩技术(量化、剪枝)在CPU、GPU和FPGA等一组资源受限的嵌入式系统上,对知名基准数据集进行实验,从而佐证我们的讨论。所得结果凸显了在资源效率与预测质量之间寻找良好权衡的难度。