Hashing is at the heart of large-scale image similarity search, and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step, a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries, e.g., -1 or 1). Still, the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages: first, perform similarity learning over the embedding space with no quantization; second, find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign, and then quantize the transformed embedding through the sign function. In the second step, we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations, this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised, fast, hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets, and, unlike other quantization strategies, brings consistent improvements in performance to existing deep hashing algorithms.
翻译:哈希是大规模图像相似性搜索的核心技术,近年来深度学习方法显著提升了此类算法的性能。这类算法通常学习数据的连续嵌入表示。为避免后续昂贵的二值化步骤,常见解决方案是采用结合相似性学习项(确保相似图像被映射到邻近嵌入空间)与量化惩罚项(确保嵌入值接近二值化值,如-1或1)的损失函数。然而,这两项之间的相互作用可能使学习过程更困难并导致嵌入质量下降。本文提出一种替代性量化策略,将学习问题分解为两个阶段:首先在嵌入空间执行无量化约束的相似性学习;其次寻找最优正交变换,使嵌入的每个坐标接近其符号值,再通过符号函数对变换后的嵌入进行量化。在第二阶段中,我们利用Householder矩阵参数化正交变换,以高效利用随机梯度下降法。由于相似性度量通常对正交变换具有不变性,该量化策略不会带来性能损失。该算法具有无监督、快速、无超参数的特点,可应用于任何现有深度哈希或度量学习算法。大量实验结果表明,该方法在常用图像数据集上达到最优性能,且与其他量化策略不同,能为现有深度哈希算法带来持续稳定的性能提升。