Cross-Entropy Is All You Need To Invert the Data Generating Process

Supervised learning has become a cornerstone of modern machine learning, yet a comprehensive theory explaining its effectiveness remains elusive. Empirical phenomena, such as neural analogy-making and the linear representation hypothesis, suggest that supervised models can learn interpretable factors of variation in a linear fashion. Recent advances in self-supervised learning, particularly nonlinear Independent Component Analysis, have shown that these methods can recover latent structures by inverting the data generating process. We extend these identifiability results to parametric instance discrimination, then show how insights transfer to the ubiquitous setting of supervised learning with cross-entropy minimization. We prove that even in standard classification tasks, models learn representations of ground-truth factors of variation up to a linear transformation. We corroborate our theoretical contribution with a series of empirical studies. First, using simulated data matching our theoretical assumptions, we demonstrate successful disentanglement of latent factors. Second, we show that on DisLib, a widely-used disentanglement benchmark, simple classification tasks recover latent structures up to linear transformations. Finally, we reveal that models trained on ImageNet encode representations that permit linear decoding of proxy factors of variation. Together, our theoretical findings and experiments offer a compelling explanation for recent observations of linear representations, such as superposition in neural networks. This work takes a significant step toward a cohesive theory that accounts for the unreasonable effectiveness of supervised deep learning.

翻译：监督学习已成为现代机器学习的基石，然而解释其有效性的完整理论仍不明晰。神经类比与线性表示假说等经验现象表明，监督模型能够以线性方式学习可解释的变异因子。自监督学习的最新进展，特别是非线性独立成分分析，已证明这些方法能够通过反演数据生成过程来恢复潜在结构。我们将这些可识别性结果推广至参数化实例判别任务，进而阐明相关洞见如何迁移至普遍采用交叉熵最小化的监督学习场景。我们证明即使在标准分类任务中，模型也能学习到真实变异因子的线性变换表示。我们通过一系列实证研究佐证理论贡献：首先，在符合理论假设的模拟数据上，我们成功实现了潜在因子的解纠缠；其次，在广泛使用的解纠缠基准数据集DisLib上，我们证明简单分类任务可恢复线性变换下的潜在结构；最后，我们发现在ImageNet上训练的模型所编码的表示支持代理变异因子的线性解码。综合理论发现与实验证据，本研究为神经网络叠加等线性表示现象提供了有力解释，朝着构建统一理论以阐释监督深度学习非凡有效性的目标迈出了重要一步。