Deep Metric Learning (DML) models rely on strong representations and similarity-based measures with specific loss functions. Proxy-based losses have shown great performance compared to pair-based losses in terms of convergence speed. However, proxies that are assigned to different classes may end up being closely located in the embedding space and hence having a hard time to distinguish between positive and negative items. Alternatively, they may become highly correlated and hence provide redundant information with the model. To address these issues, we propose a novel approach that introduces Soft Orthogonality (SO) constraint on proxies. The constraint ensures the proxies to be as orthogonal as possible and hence control their positions in the embedding space. Our approach leverages Data-Efficient Image Transformer (DeiT) as an encoder to extract contextual features from images along with a DML objective. The objective is made of the Proxy Anchor loss along with the SO regularization. We evaluate our method on four public benchmarks for category-level image retrieval and demonstrate its effectiveness with comprehensive experimental results and ablation studies. Our evaluations demonstrate the superiority of our proposed approach over state-of-the-art methods by a significant margin.
翻译:深度度量学习(DML)模型依赖于强表示和基于相似性的度量与特定损失函数。与基于对的损失相比,基于代理的损失在收敛速度方面表现出优异的性能。然而,分配给不同类别的代理可能在嵌入空间中紧密相邻,从而难以区分正样本和负样本。另一方面,它们可能变得高度相关,从而向模型提供冗余信息。为了解决这些问题,我们提出了一种新方法,在代理上引入软正交(SO)约束。该约束确保代理尽可能正交,从而控制它们在嵌入空间中的位置。我们的方法利用数据高效图像变换器(DeiT)作为编码器,从图像中提取上下文特征,并结合DML目标。该目标由代理锚点损失和SO正则化组成。我们在四个用于类别级图像检索的公开基准上评估了我们的方法,并通过全面的实验结果和消融研究证明了其有效性。我们的评估表明,所提出的方法以显著优势超越了现有最先进的方法。