Distillation of Diffusion Features for Semantic Correspondence

Semantic correspondence, the task of determining relationships between different parts of images, underpins various applications including 3D reconstruction, image-to-image translation, object tracking, and visual place recognition. Recent studies have begun to explore representations learned in large generative image models for semantic correspondence, demonstrating promising results. Building on this progress, current state-of-the-art methods rely on combining multiple large models, resulting in high computational demands and reduced efficiency. In this work, we address this challenge by proposing a more computationally efficient approach. We propose a novel knowledge distillation technique to overcome the problem of reduced efficiency. We show how to use two large vision foundation models and distill the capabilities of these complementary models into one smaller model that maintains high accuracy at reduced computational cost. Furthermore, we demonstrate that by incorporating 3D data, we are able to further improve performance, without the need for human-annotated correspondences. Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves performance superior to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence. Our code and weights are publicly available on our project page.

翻译：语义对应是确定图像不同部分之间关系的任务，支撑着包括三维重建、图像到图像转换、目标跟踪和视觉位置识别在内的多种应用。最近的研究开始探索大型生成式图像模型中学习到的表示用于语义对应，并展示了有希望的结果。基于这一进展，当前最先进的方法依赖于组合多个大型模型，导致计算需求高且效率降低。在这项工作中，我们通过提出一种计算效率更高的方法来应对这一挑战。我们提出了一种新颖的知识蒸馏技术来克服效率降低的问题。我们展示了如何使用两个大型视觉基础模型，并将这些互补模型的能力蒸馏到一个较小的模型中，该模型在降低计算成本的同时保持高精度。此外，我们证明通过融入三维数据，我们能够进一步提升性能，而无需人工标注的对应关系。总体而言，我们的实证结果表明，我们采用三维数据增强的蒸馏模型实现了优于当前最先进方法的性能，同时显著降低了计算负载，并增强了实际应用（如语义视频对应）的实用性。我们的代码和权重已在项目页面公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日