In recent years, there has been a growing trend of incorporating hyperbolic geometry methods into computer vision. While these methods have achieved state-of-the-art performance on various metric learning tasks using hyperbolic distance measurements, the underlying theoretical analysis supporting this superior performance remains under-exploited. In this study, we investigate the effects of integrating hyperbolic space into metric learning, particularly when training with contrastive loss. We identify a need for a comprehensive comparison between Euclidean and hyperbolic spaces regarding the temperature effect in the contrastive loss within the existing literature. To address this gap, we conduct an extensive investigation to benchmark the results of Vision Transformers (ViTs) using a hybrid objective function that combines loss from Euclidean and hyperbolic spaces. Additionally, we provide a theoretical analysis of the observed performance improvement. We also reveal that hyperbolic metric learning is highly related to hard negative sampling, providing insights for future work. This work will provide valuable data points and experience in understanding hyperbolic image embeddings. To shed more light on problem-solving and encourage further investigation into our approach, our code is available online (https://github.com/YunYunY/HypMix).
翻译:近年来,将双曲几何方法融入计算机视觉的趋势日益增长。尽管这些方法在利用双曲距离测量的多种度量学习任务中取得了最先进的性能,但支撑其优越性能的理论分析仍未被充分挖掘。本文研究了将双曲空间引入度量学习的效果,特别是在使用对比损失进行训练时的表现。我们发现现有文献中缺乏关于欧氏空间与双曲空间在对比损失中温度效应方面的全面比较。为弥补这一空白,我们开展了广泛研究,通过使用结合欧氏空间与双曲空间损失的混合目标函数,对视觉Transformer(Vision Transformers, ViTs)的结果进行基准测试。此外,我们提供了所观测性能提升的理论分析,并揭示了双曲度量学习与硬负采样的高度相关性,为未来研究提供了见解。本工作将为理解双曲图像嵌入提供宝贵的数据点和经验。为更清晰地解决这一课题并鼓励进一步研究,我们的代码已开源(https://github.com/YunYunY/HypMix)。