We generalize the class vectors found in neural networks to linear subspaces (i.e.~points in the Grassmann manifold) and show that the Grassmann Class Representation (GCR) enables the simultaneous improvement in accuracy and feature transferability. In GCR, each class is a subspace and the logit is defined as the norm of the projection of a feature onto the class subspace. We integrate Riemannian SGD into deep learning frameworks such that class subspaces in a Grassmannian are jointly optimized with the rest model parameters. Compared to the vector form, the representative capability of subspaces is more powerful. We show that on ImageNet-1K, the top-1 error of ResNet50-D, ResNeXt50, Swin-T and Deit3-S are reduced by 5.6%, 4.5%, 3.0% and 3.5%, respectively. Subspaces also provide freedom for features to vary and we observed that the intra-class feature variability grows when the subspace dimension increases. Consequently, we found the quality of GCR features is better for downstream tasks. For ResNet50-D, the average linear transfer accuracy across 6 datasets improves from 77.98% to 79.70% compared to the strong baseline of vanilla softmax. For Swin-T, it improves from 81.5% to 83.4% and for Deit3, it improves from 73.8% to 81.4%. With these encouraging results, we believe that more applications could benefit from the Grassmann class representation. Code is released at https://github.com/innerlee/GCR.
翻译:我们将神经网络中的类向量推广至线性子空间(即格拉斯曼流形中的点),并证明格拉斯曼类表示能够同时提升准确率与特征迁移性。在格拉斯曼类表示(GCR)中,每个类别对应一个子空间,对数几率定义为特征在类别子空间上投影的范数。我们将黎曼随机梯度下降整合至深度学习框架,使格拉斯曼流形中的类别子空间与模型其余参数得以联合优化。相较于向量形式,子空间的表征能力更为强大。实验表明,在ImageNet-1K上,ResNet50-D、ResNeXt50、Swin-T与Deit3-S的Top-1错误率分别降低5.6%、4.5%、3.0%和3.5%。子空间还赋予特征变化的自由度,我们观察到当子空间维度增加时,类内特征变异性随之增强。进一步研究发现,格拉斯曼类表示特征在下游任务中具有更优质量。以ResNet50-D为例,与原始Softmax强基线相比,其在6个数据集上的平均线性迁移准确率从77.98%提升至79.70%;Swin-T从81.5%提升至83.4%;Deit3从73.8%提升至81.4%。这些令人鼓舞的结果表明,格拉斯曼类表示有望在更多应用场景中发挥价值。代码已开源至https://github.com/innerlee/GCR。