Knowledge distillation (KD) is a well-known technique to effectively compress a large network (teacher) to a smaller network (student) with little sacrifice in performance. However, most KD methods require a large training set and internal access to the teacher, which are rarely available due to various restrictions. These challenges have originated a more practical setting known as black-box few-shot KD, where the student is trained with few images and a black-box teacher. Recent approaches typically generate additional synthetic images but lack an active strategy to promote their diversity, a crucial factor for student learning. To address these problems, we propose a novel training scheme for generative adversarial networks, where we adaptively select high-confidence images under the teacher's supervision and introduce them to the adversarial learning on-the-fly. Our approach helps expand and improve the diversity of the distillation set, significantly boosting student accuracy. Through extensive experiments, we achieve state-of-the-art results among other few-shot KD methods on seven image datasets. The code is available at https://github.com/votrinhan88/divbfkd.
翻译:知识蒸馏(KD)是一种将大型网络(教师)有效压缩为小型网络(学生)且性能损失较小的经典技术。然而,多数KD方法需要大规模训练集和对教师网络的内部访问权限,这些条件因各种限制往往难以满足。这些挑战催生了一种更实际的应用场景——黑盒少样本知识蒸馏,即利用少量图像和黑盒教师网络训练学生网络。现有方法通常生成额外的合成图像,但缺乏主动提升其多样性的策略,而多样性正是学生学习的核心要素。为解决此问题,我们提出一种新颖的生成对抗网络训练方案:在教师监督下自适应选择高置信度图像,并将其动态引入对抗学习过程。该方法有助于扩展和提升蒸馏集的多样性,显著增强学生模型的准确率。通过大量实验,我们在七个图像数据集上取得了优于其他少样本KD方法的最优结果。代码开源地址:https://github.com/votrinhan88/divbfkd。