Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models." We build upon this body of work by studying the effect of the teachers' activation statistics, particularly the impact of the loss function on the resulting student model quality. We explore a standard toolkit of statistical normalization techniques to better align the different distributions and assess their effects. Further, we examine the impact on downstream teacher-matching metrics, which motivates the use of Hadamard matrices. With these matrices, we demonstrate useful properties, showing how they can be used for isotropic standardization, where each dimension of a multivariate distribution is standardized using the same scale. We call this technique "PHI Standardization" (PHI-S) and empirically demonstrate that it produces the best student model across the suite of methods studied.
翻译:各类视觉基础模型均具备独特的优势与局限性,二者皆可通过无需标签的异构多教师知识蒸馏(称为“聚合模型”)得到改进。本研究基于该领域现有工作,深入探究教师模型激活统计量的影响,特别是损失函数对学生模型最终质量的作用机制。我们探索了统计归一化技术的标准工具集,以更好地对齐不同分布并评估其效果。此外,我们考察了对下游教师匹配指标的影响,这为哈达玛矩阵的应用提供了理论动机。通过运用这些矩阵,我们证明了其具备的实用特性,展示了如何将其用于各向同性标准化——即使用相同尺度对多元分布的每个维度进行标准化处理。我们将此技术命名为“PHI标准化”(PHI-S),并通过实证研究证明,在本文探讨的所有方法中,该技术能够生成最优的学生模型。