Deploying DNNs on System-on-Chips (SoC) with multiple heterogeneous acceleration engines is challenging, and the majority of deployment frameworks cannot fully exploit heterogeneity. We present MATCHA, a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators and uses constraint programming to optimize L3/L2 memory allocation and scheduling. Using pattern matching, tiling, and mapping across individual HW units enables parallel execution and high accelerator utilization. On the MLPerf Tiny benchmark, using a SoC with two heterogeneous accelerators, MATCHA improves accelerator utilization and reduces inference latency by up to 35% with respect to the the state-of-the-art MATCH compiler.
翻译:在具有多个异构加速引擎的片上系统上部署深度神经网络具有挑战性,且大多数部署框架无法充分利用异构性。我们提出MATCHA,一个统一的深度神经网络部署框架,可为并行异构加速器生成高度并发的调度方案,并利用约束规划优化L3/L2内存分配与调度。通过跨单个硬件单元的模式匹配、分块和映射,实现了并行执行和高加速器利用率。在MLPerf Tiny基准测试中,使用具有两个异构加速器的片上系统,MATCHA相比最先进的MATCH编译器,将加速器利用率提升并推理延迟降低高达35%。