Texture analysis is a classical yet challenging task in computer vision for which deep neural networks are actively being applied. Most approaches are based on building feature aggregation modules around a pre-trained backbone and then fine-tuning the new architecture on specific texture recognition tasks. Here we propose a new method named \textbf{R}andom encoding of \textbf{A}ggregated \textbf{D}eep \textbf{A}ctivation \textbf{M}aps (RADAM) which extracts rich texture representations without ever changing the backbone. The technique consists of encoding the output at different depths of a pre-trained deep convolutional network using a Randomized Autoencoder (RAE). The RAE is trained locally to each image using a closed-form solution, and its decoder weights are used to compose a 1-dimensional texture representation that is fed into a linear SVM. This means that no fine-tuning or backpropagation is needed. We explore RADAM on several texture benchmarks and achieve state-of-the-art results with different computational budgets. Our results suggest that pre-trained backbones may not require additional fine-tuning for texture recognition if their learned representations are better encoded.
翻译:纹理分析是计算机视觉中一项经典且具有挑战性的任务,深度神经网络正被积极应用于此领域。大多数方法基于在预训练骨干网络周围构建特征聚合模块,然后在特定纹理识别任务上微调新架构。本文提出一种名为随机聚合深度激活图编码(RADAM)的新方法,该方法在不改变骨干网络的前提下提取丰富的纹理表示。该技术利用随机自编码器(RAE)对预训练深度卷积网络在不同深度的输出进行编码。RAE 通过闭式解在每个图像上局部训练,其解码器权重被用于构建一维纹理表示,并输入线性支持向量机。这意味着无需微调或反向传播。我们在多个纹理基准上探索 RADAM,并在不同计算预算下实现了最先进的结果。我们的结果表明,如果预训练骨干网络学到的表示得到更好的编码,则可能无需额外微调即可用于纹理识别。