Images when processed using various enhancement techniques often lead to edge degradation and other unwanted artifacts such as halos. These artifacts pose a major problem for photographic applications where they can denude the quality of an image. There is a plethora of edge-aware techniques proposed in the field of image processing. However, these require the application of complex optimization or post-processing methods. Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids. This technique can be successfully applied for detail smoothing, detail enhancement, tone mapping and inverse tone mapping of an image while keeping it artifact-free. The problem though with this approach is that it is computationally expensive. Hence, parallelization schemes using multi-core CPUs and GPUs have been proposed. As is well known, they are not power-efficient, and a well-designed hardware architecture on an FPGA can do better on the performance per watt metric. In this paper, we propose a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation. To the best of our knowledge, we are not aware of any other hardware accelerators proposed in the research literature for the Local Laplacian Filtering problem.
翻译:图像在使用各种增强技术处理时,常会导致边缘退化及光晕等不期望的伪影。这些伪影对摄影应用构成重大挑战,会降低图像质量。图像处理领域已提出多种边缘感知技术,但这些技术通常需要应用复杂的优化或后处理方法。局部拉普拉斯滤波是一种边缘感知图像处理技术,涉及构造简单的高斯金字塔和拉普拉斯金字塔。该技术可成功应用于图像的细节平滑、细节增强、色调映射和逆色调映射,同时保持图像无伪影。然而,该方法的计算开销较大。为此,已提出基于多核CPU和GPU的并行化方案。众所周知,这些方案能效较低,而基于FPGA的精心设计的硬件架构在每瓦性能指标上表现更优。本文提出一种硬件加速器,在最小化FPGA片上资源占用的同时,充分挖掘局部拉普拉斯滤波算法中的并行性。在Virtex-7 FPGA上,处理1MB图像时,与优化的基线CPU实现相比,实现了7.5倍的加速。据我们所知,目前尚无其他针对局部拉普拉斯滤波问题的硬件加速器被研究文献提及。