State-of-the-art logit distillation methods exhibit versatility, simplicity, and efficiency. Despite the advances, existing studies have yet to delve thoroughly into fine-grained relationships within logit knowledge. In this paper, we propose Local Dense Relational Logit Distillation (LDRLD), a novel method that captures inter-class relationships through recursively decoupling and recombining logit information, thereby providing more detailed and clearer insights for student learning. To further optimize the performance, we introduce an Adaptive Decay Weight (ADW) strategy, which can dynamically adjust the weights for critical category pairs using Inverse Rank Weighting (IRW) and Exponential Rank Decay (ERD). Specifically, IRW assigns weights inversely proportional to the rank differences between pairs, while ERD adaptively controls weight decay based on total ranking scores of category pairs. Furthermore, after the recursive decoupling, we distill the remaining non-target knowledge to ensure knowledge completeness and enhance performance. Ultimately, our method improves the student's performance by transferring fine-grained knowledge and emphasizing the most critical relationships. Extensive experiments on datasets such as CIFAR-100, ImageNet-1K, and Tiny-ImageNet demonstrate that our method compares favorably with state-of-the-art logit-based distillation approaches. The code will be made publicly available.
翻译:最先进的对数蒸馏方法展现出多功能性、简洁性和高效性。尽管取得了这些进展,现有研究尚未深入探究对数知识内部的细粒度关系。本文提出局部密集关系对数蒸馏(LDRLD),这是一种通过递归解耦和重组对数信息来捕获类间关系的新方法,从而为学生模型学习提供更细致、更清晰的指导。为进一步优化性能,我们引入自适应衰减权重(ADW)策略,该策略能够使用逆序加权(IRW)和指数序衰减(ERD)动态调整关键类别对的权重。具体而言,IRW根据类别对之间的排序差异分配反比权重,而ERD则基于类别对的总排序分数自适应控制权重衰减。此外,在递归解耦后,我们蒸馏剩余的非目标知识以确保知识完整性并提升性能。最终,我们的方法通过传递细粒度知识并强调最关键的关系,显著提升了学生模型的性能。在CIFAR-100、ImageNet-1K和Tiny-ImageNet等数据集上的大量实验表明,我们的方法优于当前最先进的基于对数的蒸馏方法。代码将公开提供。