Sobel is one of the most popular edge detection operators used in image processing. To date, most users utilize the two-directional 3x3 Sobel operator as detectors because of its low computational cost and reasonable performance. Simultaneously, many studies have been conducted on using large multi-directional Sobel operators to satisfy their needs considering the high stability, but at an expense of speed. This paper proposes a fast graphics processing unit (GPU) kernel for the four-directional 5x5 Sobel operator. To improve kernel performance, we implement the kernel based on warp-level primitives, which can significantly reduce the number of memory accesses. In addition, we introduce the prefetching mechanism and operator transformation into the kernel to significantly reduce the computational complexity and data transmission latency. Compared with the OpenCV-GPU library, our kernel shows high performances of 6.7x speedup on a Jetson AGX Xavier GPU and 13x on a GTX 1650Ti GPU.
翻译:Sobel算子是图像处理中最常用的边缘检测算子之一。目前,大多数用户采用两方向3×3 Sobel算子作为检测器,因其计算成本低且性能合理。与此同时,考虑到高稳定性需求,许多研究利用大型多方向Sobel算子来满足要求,但这是以牺牲速度为代价的。本文提出了一种用于四方向5×5 Sobel算子的快速图形处理器(GPU)核。为提升核性能,我们基于线程束级原语实现了该核,这能显著减少内存访问次数。此外,我们在核中引入了预取机制和算子变换,以大幅降低计算复杂度和数据传输延迟。与OpenCV-GPU库相比,我们的核在Jetson AGX Xavier GPU上实现了6.7倍的性能加速,在GTX 1650Ti GPU上实现了13倍的加速。