Autoencoders (AE) are simple yet powerful class of neural networks that compress data by projecting input into low-dimensional latent space (LS). Whereas LS is formed according to the loss function minimization during training, its properties and topology are not controlled directly. In this paper we focus on AE LS properties and propose two methods for obtaining LS with desired topology, called LS configuration. The proposed methods include loss configuration using a geometric loss term that acts directly in LS, and encoder configuration. We show that the former allows to reliably obtain LS with desired configuration by defining the positions and shapes of LS clusters for supervised AE (SAE). Knowing LS configuration allows to define similarity measure in LS to predict labels or estimate similarity for multiple inputs without using decoders or classifiers. We also show that this leads to more stable and interpretable training. We show that SAE trained for clothes texture classification using the proposed method generalizes well to unseen data from LIP, Market1501, and WildTrack datasets without fine-tuning, and even allows to evaluate similarity for unseen classes. We further illustrate the advantages of pre-configured LS similarity estimation with cross-dataset searches and text-based search using a text query without language models.
翻译:自编码器(AE)是一类简单而强大的神经网络,通过将输入投影到低维隐空间(LS)来实现数据压缩。尽管隐空间是根据训练过程中的损失函数最小化而形成的,但其属性和拓扑结构并未被直接控制。本文聚焦于AE隐空间的属性,提出了两种构建期望拓扑隐空间的方法,称为隐空间配置。所提方法包括:利用作用于隐空间的几何损失项进行损失配置,以及编码器配置。我们证明,对于有监督AE(SAE),前者可通过定义隐空间中聚类的位置与形状,可靠地获得具有期望配置的隐空间。掌握隐空间配置后,可在隐空间中定义相似性度量,无需使用解码器或分类器即可预测标签或评估多个输入的相似性。我们还表明,这能获得更稳定且可解释的训练过程。实验显示,采用所提方法训练用于衣物纹理分类的SAE,在不经微调的情况下即可很好地泛化至LIP、Market1501和WildTrack数据集中的未见数据,甚至能评估未见类别的相似性。此外,我们通过跨数据集搜索和基于文本查询(无需语言模型)的文本搜索,进一步展示了预配置隐空间相似性估计的优势。