Convolutional neural networks (CNNs) require a large number of multiply-accumulate (MAC) operations. To meet real-time constraints, they often need to be executed on specialized accelerators composed of an on-chip memory and a processing unit. However, the on-chip memory is often insufficient to store all the data required to compute a CNN layer. Thus, the computation must be performed in several offloading steps. We formalise such sequences of steps and apply our formalism to a state of the art decomposition of convolutions. In order to find optimal strategies in terms of duration, we encode the problem with a set of constraints. A Python-based simulator allows to analyse in-depth computed strategies.
翻译:卷积神经网络(CNN)需要大量乘累加(MAC)运算。为满足实时性约束,这类网络通常需在由片上存储器和处理单元组成的专用加速器上执行。然而,片上存储器容量往往不足以存储计算CNN层所需的全部数据。因此,计算必须通过多次卸载步骤完成。我们对此类步骤序列进行形式化描述,并将该形式化方法应用于当前最先进的卷积分解方案。为寻找时间最优策略,我们通过约束集对问题进行编码建模。基于Python的仿真器可深入分析所生成的计算策略。