Resource-Efficient Neural Networks for Embedded Systems

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality.

翻译：尽管机器学习传统上是一项资源密集型任务，但嵌入式系统、自主导航以及物联网的愿景推动了对资源高效方法的兴趣。这些方法旨在性能和资源消耗（计算与能耗）之间审慎权衡。此类方法的开发是当前机器学习研究中的主要挑战之一，也是确保机器学习技术从拥有几乎无限计算资源的科研环境平滑过渡到日常应用的关键。本文概述了满足这些实际需求的最先进的机器学习技术现状。我们特别关注基于深度神经网络（DNN）的资源高效推理——DNN是过去十年中占据主导地位的机器学习模型。我们全面综述了庞大的相关文献，这些文献主要分为三类（非互斥）：（i）量化神经网络，（ii）网络剪枝，以及（iii）结构效率。这些技术可在训练阶段或作为后处理步骤应用，广泛用于降低内存占用、推理速度和能效方面的计算需求。我们还简要讨论了DNN嵌入式硬件的不同概念及其与机器学习技术的兼容性，以及降低能耗和延迟的潜力。我们通过使用压缩技术（量化、剪枝）在CPU、GPU和FPGA等一组资源受限的嵌入式系统上，对知名基准数据集进行实验，从而佐证我们的讨论。所得结果凸显了在资源效率与预测质量之间寻找良好权衡的难度。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日