ATOM: Attention Mixer for Efficient Dataset Distillation

Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved significant results without incurring the costs of bi-level optimization in the distillation process. Despite their convincing efficiency, many of these methods suffer from marginal downstream performance improvements, limited distillation of contextual information, and subpar cross-architecture generalization. To address these challenges in dataset distillation, we propose the ATtentiOn Mixer (ATOM) module to efficiently distill large datasets using a mixture of channel and spatial-wise attention in the feature matching process. Spatial-wise attention helps guide the learning process based on consistent localization of classes in their respective images, allowing for distillation from a broader receptive field. Meanwhile, channel-wise attention captures the contextual information associated with the class itself, thus making the synthetic image more informative for training. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets, including CIFAR10/100 and TinyImagenet. Notably, our method significantly improves performance in scenarios with a low number of images per class, thereby enhancing its potential. Furthermore, we maintain the improvement in cross-architectures and applications such as neural architecture search.

翻译：近期数据集蒸馏研究旨在通过生成一个压缩的合成数据集来封装更大真实数据集中的信息，从而降低训练开销。这些方法最终期望达到与在原始完整数据集上训练模型相似的测试准确率水平。以往的特征与分布匹配研究已在蒸馏过程中无需承担双层优化成本的前提下取得了显著成果。尽管这些方法具有令人信服的效率优势，但其中许多方法仍存在下游性能提升有限、上下文信息蒸馏不充分以及跨架构泛化能力不足等问题。为应对数据集蒸馏中的这些挑战，我们提出了注意力混合器（ATOM）模块，通过在特征匹配过程中混合通道注意力与空间注意力实现大容量数据集的高效蒸馏。空间注意力有助于基于类别在对应图像中的一致定位引导学习过程，从而在更广的感受野范围内实现蒸馏。同时，通道注意力能够捕获与类别本身相关的上下文信息，使合成图像对训练更具信息量。通过集成两种注意力机制，我们的ATOM模块在包括CIFAR10/100和TinyImagenet在内的多个计算机视觉数据集上展现出优越性能。值得注意的是，本方法在每类图像数量较少的场景中显著提升了性能，从而增强了其应用潜力。此外，我们还在神经架构搜索等跨架构应用场景中保持了性能提升。