DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.

翻译：深度神经网络（DNN）视频分析对于自动驾驶车辆、无人机（UAV）和安全机器人等自主系统至关重要。然而，由于计算资源和电池电量有限，其实际部署面临挑战。为应对这些挑战，持续学习在部署（推理）时利用轻量级“学生”模型，借助更大的“教师”模型为采样数据标注（标注），并持续重新训练学生模型以适应不断变化的场景（再训练）。本文指出了当前最先进的持续学习系统的局限性：（1）它们专注于再训练的计算，而忽视了推理和标注的计算需求；（2）它们依赖高功耗的GPU，不适用于电池供电的自主系统；（3）它们位于远程集中式服务器上，专为多租户场景设计，同样因隐私、网络可用性和延迟问题而不适用于自主系统。我们提出了一种硬件-算法协同设计的持续学习解决方案DaCapo，使自主系统能够以高性能和节能的方式并发执行推理、标注和训练。DaCapo包括（1）一个空间可分且精度灵活的加速器，支持在不同精度的子加速器上并行执行内核；（2）一种时空资源分配算法，策略性地权衡资源与精度，以做出最优的资源分配决策，实现最高精度。我们的评估表明，与最先进的基于GPU的持续学习系统Ekya和EOMU相比，DaCapo的精度分别提高了6.5%和5.5%，同时功耗降低了254倍。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日