Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
翻译:自编码器是机器学习与有损数据压缩许多经验分支中的突出模型。然而,即使在浅层两层架构中,基本的理论问题仍未有答案。具体而言,浅层自编码器能在多大程度上捕捉底层数据分布的结构?对于稀疏高斯数据的1比特压缩这一典型情况,我们证明了梯度下降收敛到的解完全忽略了输入的稀疏结构。即,该算法的性能与压缩高斯源(无稀疏性)时的性能相同。对于一般的数据分布,我们给出了梯度下降极小化器形状中相变现象的证据,该现象是数据稀疏度的函数:在临界稀疏度以下,极小化器是均匀随机选取的旋转(如同非稀疏数据压缩中的情况);在临界稀疏度以上,极小化器是恒等映射(至多相差一个置换)。最后,通过利用与近似消息传递算法的联系,我们展示了如何改进稀疏数据压缩中相对于高斯性能的表现:在浅层架构中添加去噪函数已经能可证明地减少损失,而合适的多层解码器则带来进一步改进。我们在图像数据集(如CIFAR-10和MNIST)上验证了我们的发现。