Attention mechanisms have become of crucial importance in deep learning in recent years. These non-local operations, which are similar to traditional patch-based methods in image processing, complement local convolutions. However, computing the full attention matrix is an expensive step with heavy memory and computational loads. These limitations curb network architectures and performances, in particular for the case of high resolution images. We propose an efficient attention layer based on the stochastic algorithm PatchMatch, which is used for determining approximate nearest neighbors. We refer to our proposed layer as a "Patch-based Stochastic Attention Layer" (PSAL). Furthermore, we propose different approaches, based on patch aggregation, to ensure the differentiability of PSAL, thus allowing end-to-end training of any network containing our layer. PSAL has a small memory footprint and can therefore scale to high resolution images. It maintains this footprint without sacrificing spatial precision and globality of the nearest neighbors, which means that it can be easily inserted in any level of a deep architecture, even in shallower levels. We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting, guided image colorization, and single-image super-resolution. Our code is available at: https://github.com/ncherel/psal
翻译:近年来,注意力机制在深度学习中变得至关重要。这些与图像处理中传统基于块的方法类似的非局部操作,有效补充了局部卷积的作用。然而,计算完整的注意力矩阵是一个代价高昂的步骤,会带来沉重的内存和计算负担。这些限制制约了网络架构的设计与性能,尤其是在处理高分辨率图像的情况下。我们提出了一种基于随机算法PatchMatch的高效注意力层,该算法用于确定近似最近邻。我们将所提出的层命名为"基于块的随机注意力层"(PSAL)。此外,我们提出了基于块聚合的不同方法来确保PSAL的可微性,从而允许包含该层的任何网络进行端到端训练。PSAL具有较小的内存占用,因此可以扩展到高分辨率图像。它在保持这一内存占用的同时,不会牺牲最近邻的空间精度和全局性,这意味着它可以轻松插入深度架构的任何层级,甚至包括较浅的层级。我们通过多种图像编辑任务(如图像修复、引导图像着色和单图像超分辨率)展示了PSAL的有效性。我们的代码已开源:https://github.com/ncherel/psal