Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.
翻译:多视图自监督学习基于对一组输入变换的不变性学习。然而,不变性会部分或完全移除表示中与变换相关的信息,这可能损害需要此类信息的特定下游任务的表现。我们提出二维结构化与等变表示(简称DUET),其采用矩阵结构组织的二维表示,并对作用于输入数据的变换保持等变性。DUET表示在保留输入变换信息的同时,仍保持语义表达能力。与SimCLR(Chen等人,2020)(非结构化且不变性)和ESSL(Dangovski等人,2022)(非结构化且等变性)相比,DUET表示的结构化与等变特性使得以更低重建误差实现受控生成成为可能,而SimCLR或ESSL无法实现可控性。此外,DUET在多个判别任务上取得更高准确率,并改进了迁移学习效果。