This paper focuses on an accurate and fast interpolation approach for image transformation employed in the design of CNN architectures. Standard Spatial Transformer Networks (STNs) use bilinear or linear interpolation as their interpolation, with unrealistic assumptions about the underlying data distributions, which leads to poor performance under scale variations. Moreover, STNs do not preserve the norm of gradients in propagation due to their dependency on sparse neighboring pixels. To address this problem, a novel Entropy STN (ESTN) is proposed that interpolates on the data manifold distributions. In particular, random samples are generated for each pixel in association with the tangent space of the data manifold and construct a linear approximation of their intensity values with an entropy regularizer to compute the transformer parameters. A simple yet effective technique is also proposed to normalize the non-zero values of the convolution operation, to fine-tune the layers for gradients' norm-regularization during training. Experiments on challenging benchmarks show that the proposed ESTN can improve predictive accuracy over a range of computer vision tasks, including image reconstruction, and classification, while reducing the computational cost.
翻译:本文聚焦于面向CNN架构设计的图像变换中精确且快速的插值方法。标准空间Transformer网络(STN)采用双线性或线性插值法,但其对底层数据分布的不切实际假设导致在尺度变化下性能不佳。此外,由于依赖稀疏邻域像素,STN在传播过程中无法保持梯度范数。为解决该问题,本文提出一种新型熵STN(ESTN),其在数据流形分布上进行插值。具体而言,针对每个像素生成与数据流形切空间相关的随机样本,通过带熵正则化的强度值线性近似计算变换参数。同时提出一种简洁有效的技术,通过对卷积操作中的非零值进行归一化,在训练过程中对网络层的梯度范数进行正则化微调。在具有挑战性的基准测试上的实验表明,所提出的ESTN可在降低计算成本的同时,提升图像重建、分类等一系列计算机视觉任务的预测精度。