We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level optimization problem. Then, we show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural tangent kernel. Finally, we improve bias in implicit gradients by parameterizing the neural network to enable analytical computation of final-layer parameters given the body parameters. RCIG establishes the new state-of-the-art on a diverse series of dataset distillation tasks. Notably, with one image per class, on resized ImageNet, RCIG sees on average a 108\% improvement over the previous state-of-the-art distillation algorithm. Similarly, we observed a 66\% gain over SOTA on Tiny-ImageNet and 37\% on CIFAR-100.
翻译:我们提出了一种新的数据集蒸馏算法,通过重新参数化和隐式梯度的凸化近似(RCIG),显著提升了现有技术水平。为此,我们首先将数据集蒸馏形式化为一个双层优化问题。接着,我们展示了如何有效利用隐式梯度来计算元梯度更新。进一步,我们为该算法配备了凸化近似,相当于在冻结的有限宽度神经正切核上进行学习。最后,我们通过参数化神经网络使其能够根据主体参数解析计算最终层参数,从而改进了隐式梯度中的偏差。RCIG在多样的数据集蒸馏任务上确立了新的最优性能。值得注意的是,在调整尺寸后的ImageNet数据集上,每类仅用一张图片时,RCIG相比此前最优蒸馏算法平均提升了108%。类似地,我们在Tiny-ImageNet上观察到66%的性能提升,在CIFAR-100上达到37%的提升。