Orthonormal Product Quantization Network for Scalable Face Image Retrieval

Existing deep quantization methods provided an efficient solution for large-scale image retrieval. However, the significant intra-class variations like pose, illumination, and expressions in face images, still pose a challenge for face image retrieval. In light of this, face image retrieval requires sufficiently powerful learning metrics, which are absent in current deep quantization works. Moreover, to tackle the growing unseen identities in the query stage, face image retrieval drives more demands regarding model generalization and system scalability than general image retrieval tasks. This paper integrates product quantization with orthonormal constraints into an end-to-end deep learning framework to effectively retrieve face images. Specifically, a novel scheme that uses predefined orthonormal vectors as codewords is proposed to enhance the quantization informativeness and reduce codewords' redundancy. A tailored loss function maximizes discriminability among identities in each quantization subspace for both the quantized and original features. An entropy-based regularization term is imposed to reduce the quantization error. Experiments are conducted on four commonly-used face datasets under both seen and unseen identities retrieval settings. Our method outperforms all the compared deep hashing/quantization state-of-the-arts under both settings. Results validate the effectiveness of the proposed orthonormal codewords in improving models' standard retrieval performance and generalization ability. Combing with further experiments on two general image datasets, it demonstrates the broad superiority of our method for scalable image retrieval.

翻译：现有深度量化方法为大规模图像检索提供了高效解决方案。然而，人脸图像中姿态、光照、表情等显著的类内差异仍对人脸图像检索构成挑战。鉴于此，人脸图像检索需要足够强大的学习度量，而这在当前深度量化工作中尚属空白。此外，为应对查询阶段不断增长的非训练集身份，人脸图像检索对模型泛化能力与系统可扩展性的要求高于通用图像检索任务。本文通过将乘积量化与正交约束融入端到端深度学习框架，有效实现人脸图像检索。具体而言，提出一种利用预定义正交向量作为码字的新型方案，以增强量化信息量并降低码字冗余性。定制化的损失函数在量化子空间中最大化各身份间的判别性，同时兼顾量化特征与原始特征。引入基于熵的正则化项以减少量化误差。在四个常用人脸数据集上，针对训练集内与训练集外身份检索两种设置进行实验。我们的方法在两种设置下均优于所有对比的深度哈希/量化最新技术。实验结果验证了所提正交码字在提升模型标准检索性能与泛化能力方面的有效性。结合两个通用图像数据集上的进一步实验，证明了本方法在可扩展图像检索中的广泛优越性。