Streaming coarse-grained reconfgurable array (CGRA) is a promising architecture for data/computing-intensive applications because of its fexibility, high throughput and efcient memory system. However,when accelerating sparse CNNs, the irregular input data demands inside sparse CNNs would cause excessive caching operations (COPs) and multi-cycle internal dependencies (MCIDs) between operations, declining the throughput of the streaming CGRA. We propose a mapping method for sparse CNNs onto streaming CGRA, SparseMap, which incorporates an efcient I/O data management along with operation scheduling and binding, to reduce the COPs and MCIDs, thereby ensuring the optimal throughput of streaming CGRA.The experimental results show SparseMap reduces 92.5% COPs and 46.0 % MCIDs while achieves the same or even smaller initiation interval (II) compared to previous works.
翻译:流式粗粒度可重构阵列(CGRA)因其灵活性、高吞吐量和高效存储系统,在数据/计算密集型应用中展现出巨大潜力。然而,在加速稀疏卷积神经网络(CNN)时,稀疏CNN内部不规则的输入数据需求会导致过多的缓存操作(COP)以及操作间的多周期内部依赖(MCID),从而降低流式CGRA的吞吐量。本文提出一种将稀疏CNN映射至流式CGRA的方法——SparseMap,该方法通过集成高效的I/O数据管理、操作调度与绑定,有效减少COP与MCID,从而确保流式CGRA获得最优吞吐量。实验结果表明,与现有工作相比,SparseMap在达到相同甚至更小启动间隔(II)的同时,能够减少92.5%的COP和46.0%的MCID。