We formulate a data independent latent space regularisation constraint for general unsupervised autoencoders. The regularisation rests on sampling the autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre quadrature. Revisiting this classic enables to prove that regularised autoencoders ensure a one-to-one re-embedding of the initial data manifold to its latent representation. Demonstrations show that prior proposed regularisation strategies, such as contractive autoencoding, cause topological defects already for simple examples, and so do convolutional based (variational) autoencoders. In contrast, topological preservation is ensured already by standard multilayer perceptron neural networks when being regularised due to our contribution. This observation extends through the classic FashionMNIST dataset up to real world encoding problems for MRI brain scans, suggesting that, across disciplines, reliable low dimensional representations of complex high-dimensional datasets can be delivered due to this regularisation technique.
翻译:本文针对通用无监督自编码器提出一种数据无关的潜在空间正则化约束。该正则化方法基于在Legendre节点(即Gauss-Legendre求积中心)处对自编码器雅可比矩阵进行采样。重新审视这一经典方法可证明,正则化自编码器能确保初始数据流形与其潜在表示之间存在一一对应的重嵌入关系。实验表明,先前提出的正则化策略(如收缩自编码)即使在简单示例中也会导致拓扑缺陷,基于卷积的(变分)自编码器同样如此。相反,采用我们提出的正则化方法后,标准多层感知机神经网络即可确保拓扑保持性。这一观察结果从经典的FashionMNIST数据集延伸至真实世界的MRI脑部扫描编码问题,表明跨学科领域中,该正则化技术可为复杂高维数据集提供可靠的降维表示。