Autoencoders (AE) are simple yet powerful class of neural networks that compress data by projecting input into low-dimensional latent space (LS). Whereas LS is formed according to the loss function minimization during training, its properties and topology are not controlled directly. In this paper we focus on AE LS properties and propose two methods for obtaining LS with desired topology, called LS configuration. The proposed methods include loss configuration using a geometric loss term that acts directly in LS, and encoder configuration. We show that the former allows to reliably obtain LS with desired configuration by defining the positions and shapes of LS clusters for supervised AE (SAE). Knowing LS configuration allows to define similarity measure in LS to predict labels or estimate similarity for multiple inputs without using decoders or classifiers. We also show that this leads to more stable and interpretable training. We show that SAE trained for clothes texture classification using the proposed method generalizes well to unseen data from LIP, Market1501, and WildTrack datasets without fine-tuning, and even allows to evaluate similarity for unseen classes. We further illustrate the advantages of pre-configured LS similarity estimation with cross-dataset searches and text-based search using a text query without language models.
翻译:自编码器(AE)是一类简单而强大的神经网络,通过将输入投影到低维潜在空间进行数据压缩。尽管潜在空间依据训练过程中损失函数的最小化而形成,但其特性与拓扑结构并未得到直接控制。本文聚焦于AE潜在空间的特性,提出两种实现目标拓扑结构潜在空间的方法,即"潜在空间配置"。所提方法包括:利用直接作用于潜在空间的几何损失项进行损失配置,以及编码器配置。我们证明前者可通过定义监督AE中潜在空间聚类的位置与形状,可靠地获得具有目标配置的潜在空间。掌握潜在空间配置后,可在无需解码器或分类器的情况下,通过定义潜在空间中的相似度度量来预测标签或评估多输入的相似性。我们同时表明,该方法能带来更稳定、更具可解释性的训练过程。实验证明,采用所提方法训练用于衣物纹理分类的监督AE,无需微调即可在LIP、Market1501和WildTrack数据集的未见数据上实现良好泛化,甚至能评估未见类别的相似性。通过跨数据集检索及基于文本查询(无需语言模型)的搜索实验,我们进一步展示了预配置潜在空间相似度估计的优势。