Model inversion, which aims to reconstruct the original training data from pre-trained discriminative models, is especially useful when the original training data is unavailable due to privacy, usage rights, or size constraints. However, existing dense inversion methods attempt to reconstruct the entire image area, making them extremely inefficient when inverting high-resolution images from large-scale Vision Transformers (ViTs). We further identify two underlying causes of this inefficiency: the redundant inversion of noisy backgrounds and the unintended inversion of spurious correlations--a phenomenon we term "hallucination" in model inversion. To address these limitations, we propose a novel sparse model inversion strategy, as a plug-and-play extension to speed up existing dense inversion methods with no need for modifying their original loss functions. Specifically, we selectively invert semantic foregrounds while stopping the inversion of noisy backgrounds and potential spurious correlations. Through both theoretical and empirical studies, we validate the efficacy of our approach in achieving significant inversion acceleration (up to 3.79 faster) while maintaining comparable or even enhanced downstream performance in data-free model quantization and data-free knowledge transfer. Code is available at https://github.com/Egg-Hu/SMI.
翻译:模型反演旨在从预训练的判别模型中重建原始训练数据,当原始训练数据因隐私、使用权限制或规模约束而无法获取时,该方法尤为重要。然而,现有的密集反演方法试图重建整个图像区域,导致在反演大规模视觉Transformer(ViTs)的高分辨率图像时效率极低。我们进一步识别了这种低效性的两个根本原因:对噪声背景的冗余反演,以及对虚假关联的无意反演——我们将此现象称为模型反演中的“幻觉”。为克服这些局限,我们提出了一种新颖的稀疏模型反演策略,作为即插即用的扩展方案,无需修改原有损失函数即可加速现有密集反演方法。具体而言,我们选择性反演语义前景,同时停止对噪声背景及潜在虚假关联的反演。通过理论与实证研究,我们验证了该方法在实现显著反演加速(最高达3.79倍)的同时,在无数据模型量化和无数据知识迁移等下游任务中保持相当甚至更优的性能。代码发布于 https://github.com/Egg-Hu/SMI。