In this paper, we utilize information-theoretic metrics like matrix entropy and mutual information to analyze supervised learning. We explore the information content of data representations and classification head weights and their information interplay during supervised training. Experiments show that matrix entropy cannot solely describe the interaction of the information content of data representation and classification head weights but it can effectively reflect the similarity and clustering behavior of the data. Inspired by this, we propose a cross-modal alignment loss to improve the alignment between the representations of the same class from different modalities. Moreover, in order to assess the interaction of the information content of data representation and classification head weights more accurately, we utilize new metrics like matrix mutual information ratio (MIR) and matrix information entropy difference ratio (HDR). Through theory and experiment, we show that HDR and MIR can not only effectively describe the information interplay of supervised training but also improve the performance of supervised and semi-supervised learning.
翻译:本文利用矩阵熵与互信息等信息论度量分析监督学习过程。我们探究数据表示与分类头权重的信息内容及其在监督训练中的信息交互作用。实验表明,矩阵熵虽不能单独描述数据表示与分类头权重信息内容的相互作用,但能有效反映数据的相似性与聚类行为。受此启发,我们提出跨模态对齐损失函数以提升不同模态下同类数据表示的对齐效果。此外,为更精准评估数据表示与分类头权重的信息内容交互,我们采用矩阵互信息比与矩阵信息熵差比等新型度量指标。通过理论与实验验证,我们证明矩阵信息熵差比与矩阵互信息比不仅能有效描述监督训练的信息交互机制,还能提升监督学习与半监督学习的性能表现。