Decoupled dataset distillation (DD) compresses large corpora into a few synthetic images by matching a frozen teacher's statistics. However, current residual-matching pipelines rely on static real patches, creating a fit-complexity gap and a pull-to-anchor effect that reduce intra-class diversity and hurt generalization. To address these issues, we introduce RETA -- a Retrieval and Topology Alignment framework for decoupled DD. First, Dynamic Retrieval Connection (DRC) selects a real patch from a prebuilt pool by minimizing a fit-complexity score in teacher feature space; the chosen patch is injected via a residual connection to tighten feature fit while controlling injected complexity. Second, Persistent Topology Alignment (PTA) regularizes synthesis with persistent homology: we build a mutual k-NN feature graph, compute persistence images of components and loops, and penalize topology discrepancies between real and synthetic sets, mitigating pull-to-anchor effect. Across CIFAR-100, Tiny-ImageNet, ImageNet-1K, and multiple ImageNet subsets, RETA consistently outperforms various baselines under comparable time and memory, especially reaching 64.3% top-1 accuracy on ImageNet-1K with ResNet-18 at 50 images per class, +3.1% over the best prior.
翻译:解耦式数据库蒸馏通过匹配固定教师模型的统计特征,将大规模数据集压缩为少量合成图像。然而,当前基于残差匹配的流程依赖静态的真实图像块,导致拟合-复杂度鸿沟与锚点牵引效应,降低了类内多样性并损害泛化性能。为解决这些问题,我们提出RETA——一种面向解耦式数据库蒸馏的检索与拓扑对齐框架。首先,动态检索连接通过最小化教师特征空间中的拟合-复杂度评分,从预构建池中选取真实图像块;所选图像块通过残差连接注入,在控制注入复杂度的同时收紧特征拟合。其次,持续拓扑对齐利用持续同调进行合成正则化:我们构建互k近邻特征图,计算连通分支与环的持续同调图像,并通过惩罚真实集与合成集间的拓扑差异来缓解锚点牵引效应。在CIFAR-100、Tiny-ImageNet、ImageNet-1K及多个ImageNet子集的实验中,RETA在可比较的时间与内存消耗下持续超越多种基线方法,尤其在ImageNet-1K上以ResNet-18架构达到每类50张图像时64.3%的top-1准确率,较先前最佳结果提升3.1%。