Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference. However, the lightweight student model lacks adequate representation capacity for effective knowledge imitation during the most critical early training period, causing final performance degeneration. To tackle this issue, we propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity. Specifically, the employed student model is initially a heavy model to fruitfully learn distilled knowledge in the early training epochs, and the student model is gradually compressed during the training. To dynamically adjust the model capacity, our dynamic framework inserts a learnable convolutional layer within each residual block in the student model as the channel importance indicator. The indicator is optimized simultaneously by the image retrieval loss and the compression loss, and a retrieval-guided gradient resetting mechanism is proposed to release the gradient conflict. Extensive experiments show that our method has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher, our method saves 67.13% model parameters and 65.67% FLOPs (around 24.13% and 21.94% higher than state-of-the-arts) without sacrificing accuracy (around 2.11% mAP higher than state-of-the-arts).
翻译:先前基于知识蒸馏的高效图像检索方法采用轻量级网络作为学生模型以实现快速推理。然而,轻量级学生模型在训练初期最关键的阶段缺乏足够的表征能力以有效模仿知识,从而导致最终性能退化。为解决这一问题,我们提出了一种容量动态蒸馏框架,该框架构建了一个具有可编辑表征能力的学生模型。具体而言,所采用的学生模型初始时为一个重型模型,以便在训练初期充分学习蒸馏知识,随后在训练过程中逐步压缩该学生模型。为动态调整模型容量,我们的动态框架在学生模型的每个残差块中插入一个可学习的卷积层作为通道重要性指示器。该指示器通过图像检索损失和压缩损失进行联合优化,并提出了检索引导的梯度重置机制以缓解梯度冲突。大量实验表明,我们的方法在推理速度和精度上均具有优越性,例如在VeRi-776数据集上,以ResNet101为教师模型时,该方法在不牺牲精度(平均精度比当前最优方法高约2.11%)的情况下,节省了67.13%的模型参数和65.67%的FLOPs(分别比当前最优方法高约24.13%和21.94%)。