Bi-Lipschitz Autoencoder With Injectivity Guarantee

Autoencoders are widely used for dimensionality reduction, based on the assumption that high-dimensional data lies on low-dimensional manifolds. Regularized autoencoders aim to preserve manifold geometry during dimensionality reduction, but existing approaches often suffer from non-injective mappings and overly rigid constraints that limit their effectiveness and robustness. In this work, we identify encoder non-injectivity as a core bottleneck that leads to poor convergence and distorted latent representations. To ensure robustness across data distributions, we formalize the concept of admissible regularization and provide sufficient conditions for its satisfaction. In this work, we propose the Bi-Lipschitz Autoencoder (BLAE), which introduces two key innovations: (1) an injective regularization scheme based on a separation criterion to eliminate pathological local minima, and (2) a bi-Lipschitz relaxation that preserves geometry and exhibits robustness to data distribution drift. Empirical results on diverse datasets show that BLAE consistently outperforms existing methods in preserving manifold structure while remaining resilient to sampling sparsity and distribution shifts. Code is available at https://github.com/qipengz/BLAE.

翻译：自编码器广泛应用于降维任务，其核心假设是高频数据分布在低维流形上。正则化自编码器旨在降维过程中保持流形几何结构，但现有方法常因非单射映射和过度刚性约束导致效能与鲁棒性受限。本研究首先识别出编码器非单射性是导致收敛困难与潜在表征扭曲的核心瓶颈。为保障跨数据分布的鲁棒性，我们形式化定义了可容许正则化概念，并给出其满足的充分条件。基于此提出双Lipschitz自编码器（BLAE），包含两项关键创新：（1）基于分离准则的单射正则化方案以消除病态局部极小值；（2）保持几何结构的双Lipschitz松弛策略，同时具备对数据分布漂移的鲁棒性。多类数据集上的实验结果表明，BLAE在保持流形结构方面持续优于现有方法，并能有效应对采样稀疏性与分布偏移。代码开源于https://github.com/qipengz/BLAE。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【CIKM2023】GiGaMAE: 通过协同潜在空间重建的可泛化图掩码自编码器

专知会员服务

23+阅读 · 2023年8月22日

【CVPR2023】面向自监督视觉表示学习的混合自编码器

专知会员服务

25+阅读 · 2023年4月3日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【何恺明组新论文】掩码自编码器作为时空学习器，Masked Autoencoders As Spatiotemporal Learners

专知会员服务

39+阅读 · 2022年5月19日