Autoencoders are widely used for dimensionality reduction, based on the assumption that high-dimensional data lies on low-dimensional manifolds. Regularized autoencoders aim to preserve manifold geometry during dimensionality reduction, but existing approaches often suffer from non-injective mappings and overly rigid constraints that limit their effectiveness and robustness. In this work, we identify encoder non-injectivity as a core bottleneck that leads to poor convergence and distorted latent representations. To ensure robustness across data distributions, we formalize the concept of admissible regularization and provide sufficient conditions for its satisfaction. In this work, we propose the Bi-Lipschitz Autoencoder (BLAE), which introduces two key innovations: (1) an injective regularization scheme based on a separation criterion to eliminate pathological local minima, and (2) a bi-Lipschitz relaxation that preserves geometry and exhibits robustness to data distribution drift. Empirical results on diverse datasets show that BLAE consistently outperforms existing methods in preserving manifold structure while remaining resilient to sampling sparsity and distribution shifts. Code is available at https://github.com/qipengz/BLAE.
翻译:自编码器广泛应用于降维任务,其核心假设是高频数据分布在低维流形上。正则化自编码器旨在降维过程中保持流形几何结构,但现有方法常因非单射映射和过度刚性约束导致效能与鲁棒性受限。本研究首先识别出编码器非单射性是导致收敛困难与潜在表征扭曲的核心瓶颈。为保障跨数据分布的鲁棒性,我们形式化定义了可容许正则化概念,并给出其满足的充分条件。基于此提出双Lipschitz自编码器(BLAE),包含两项关键创新:(1)基于分离准则的单射正则化方案以消除病态局部极小值;(2)保持几何结构的双Lipschitz松弛策略,同时具备对数据分布漂移的鲁棒性。多类数据集上的实验结果表明,BLAE在保持流形结构方面持续优于现有方法,并能有效应对采样稀疏性与分布偏移。代码开源于https://github.com/qipengz/BLAE。