DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System

As deep neural networks (DNNs) prove their importance and feasibility, more and more DNN-based apps, such as detection and classification of objects, have been developed and deployed on autonomous vehicles (AVs). To meet their growing expectations and requirements, AVs should "optimize" use of their limited onboard computing resources for multiple concurrent in-vehicle apps while satisfying their timing requirements (especially for safety). That is, real-time AV apps should share the limited on-board resources with other concurrent apps without missing their deadlines dictated by the frame rate of a camera that generates and provides input images to the apps. However, most, if not all, of existing DNN solutions focus on enhancing the concurrency of their specific hardware without dynamically optimizing/modifying the DNN apps' resource requirements, subject to the number of running apps, owing to their high computational cost. To mitigate this limitation, we propose DynaMIX (Dynamic MIXed-precision model construction), which optimizes the resource requirement of concurrent apps and aims to maximize execution accuracy. To realize a real-time resource optimization, we formulate an optimization problem using app performance profiles to consider both the accuracy and worst-case latency of each app. We also propose dynamic model reconfiguration by lazy loading only the selected layers at runtime to reduce the overhead of loading the entire model. DynaMIX is evaluated in terms of constraint satisfaction and inference accuracy for a multi-tasking system and compared against state-of-the-art solutions, demonstrating its effectiveness and feasibility under various environmental/operating conditions.

翻译：随着深度神经网络（DNN）重要性和可行性的验证，越来越多基于DNN的应用（如目标检测与分类）被开发并部署于自动驾驶汽车（AVs）上。为满足其日益增长的期望与需求，自动驾驶汽车需在保障多个并发车载应用时间约束（尤其是安全性要求）的前提下，"优化"其有限车载计算资源的分配。具体而言，实时自动驾驶应用需与其它并发应用共享有限的车载资源，同时不违反由生成并提供输入图像的摄像头帧速率所决定的截止时间。然而，现有大多数（若非全部）DNN解决方案侧重于提升特定硬件的并发性，却未根据运行应用数量动态优化/调整DNN应用的资源需求，这主要归因于其高昂的计算成本。为缓解此局限性，我们提出DynaMIX（动态混合精度模型构建），该方法优化并发应用的资源需求，并旨在最大化执行精度。为实现实时资源优化，我们利用应用性能配置文件构建优化问题，综合考虑每个应用的精度与最差情况延迟。此外，我们提出通过运行时惰性加载仅选定的层来实现动态模型重构，以降低加载整个模型的开销。我们在多任务系统上从约束满足性与推理精度两方面评估DynaMIX，并与现有最优解决方案进行对比，验证了其在多种环境/运行条件下的有效性与可行性。