Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the comparison of different convolution algorithms is an error-prone task as it requires specific data layouts and system resources. Failure to address these requirements might lead to unwanted time penalties. Thus, considering all processing steps within convolution algorithms is essential to comprehensively evaluate and fairly compare their performance. Furthermore, most known convolution benchmarking adopts ad-hoc testing suites with limited coverage and handmade operations. This paper proposes ConvBench, a primitive-level benchmark for the evaluation and comparison of convolution algorithms. It assesses 9243 convolution operations derived from 1097 real-world deep learning models, resulting in performance and execution breakdown graphs for a detailed evaluation. ConvBench capability is evaluated across the Sliced Convolution (SConv) algorithm. The experiments showed results faster than Im2col-GEMM in 93.6% of the convolutions. However, the use of ConvBench allowed the delving into the remaining 6.4% underperforming convolutions, uncovering a critical slowdown of 79.5% on average of SConv's packing step. This analysis underscores a potential source of optimization for SConv, opening up new paths for convolution designers to improve their algorithms.
翻译:卷积是卷积神经网络(CNN)的核心计算密集型运算。它催生了许多高性能算法的发展,例如Im2col-GEMM、Winograd和直接卷积。然而,比较不同卷积算法是一项容易出错的任务,因为它需要特定的数据布局和系统资源。未能满足这些要求可能导致不必要的时间开销。因此,全面考虑卷积算法中的所有处理步骤对于全面评估和公平比较其性能至关重要。此外,大多数已知的卷积基准测试采用覆盖范围有限且操作手工化的临时测试套件。本文提出ConvBench,一个用于评估和比较卷积算法的原语级基准测试工具。它评估了来自1097个真实世界深度学习模型的9243个卷积运算,生成性能和执行分解图以进行详细评估。ConvBench的能力通过切片卷积(SConv)算法进行了验证。实验表明,在93.6%的卷积运算中,SConv的结果快于Im2col-GEMM。然而,使用ConvBench能够深入分析其余6.4%表现不佳的卷积,揭示了SConv打包步骤平均存在79.5%的严重减速。这一分析突显了SConv的潜在优化方向,为卷积算法设计者改进其算法开辟了新路径。