We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error. Building on this result, we quantitatively characterize the advantage of the considered architecture over the autoencoder without the skip connection that relates closely to principal component analysis. We further show that our results accurately capture the learning curves on a range of real data sets.
翻译:我们研究使用带有权重绑定和跳跃连接的双层非线性自编码器对高斯混合数据进行去噪的问题。考虑高维极限情形:训练样本数与输入维度共同趋于无穷,而隐藏单元数保持有界。我们给出了去噪均方测试误差的闭式表达式。基于此结果,我们定量刻画了所考虑架构相较于无跳跃连接(后者与主成分分析密切相关)的自编码器的优势。进一步证明,我们的结果能精确捕获一系列真实数据集上的学习曲线。