Recently, dataset distillation has paved the way towards efficient machine learning, especially for image datasets. However, the distillation for videos, characterized by an exclusive temporal dimension, remains an underexplored domain. In this work, we provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression. Our investigation reveals that the temporal information is usually not well learned during distillation, and the temporal dimension of synthetic data contributes little. The observations motivate our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block. Our method achieves state-of-the-art on video datasets at different scales, with a notably smaller memory storage budget. Our code is available at https://github.com/yuz1wan/video_distillation.
翻译:近期,数据集蒸馏为高效机器学习铺平了道路,特别是在图像数据集领域。然而,具有独特时间维度的视频蒸馏仍是一个未被充分探索的领域。在本工作中,我们首次系统性地研究了视频蒸馏,并引入了一种分类法来对时间压缩进行归类。研究发现,在蒸馏过程中时间信息通常未能得到充分学习,且合成数据的时间维度贡献甚微。这一观察结果促使我们提出了一个统一框架,用于解耦视频中的动态和静态信息。该框架首先将视频蒸馏为静态图像作为静态记忆,然后通过一个可学习的动态记忆块来补偿动态与运动信息。我们的方法在多个不同规模的视频数据集上达到了最先进水平,且记忆存储预算显著更小。我们的代码已开源在 https://github.com/yuz1wan/video_distillation。