To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.
翻译:为满足深度神经网络(DNN)日益增长的计算需求,已提出多种专用硬件架构。每个DNN层需以最高效的调度方式映射至硬件,然而现有最优调度器(SotA)难以在合理时间内为所有DNN-硬件组合持续提供最优调度方案。本文提出SALSA(一种快速双引擎调度器),可为均匀与非均匀映射生成最优执行调度。我们引入新策略,将穷举搜索与模拟退火相结合,以应对各层循环排序设计空间规模的动态特性。SALSA在5种不同DNN上针对两种SotA调度器(LOMA与Timeloop)进行广泛基准测试,平均而言,与LOMA和Timeloop相比,SALSA能够找到能耗分别降低11.9%和7.6%的调度方案,同时将搜索速度分别提升1.7倍和24倍。