Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.
翻译:多视图自监督学习基于对一组输入变换的不变性学习。然而,不变性会部分或完全移除表示中与变换相关的信息,这可能损害需要此类信息的特定下游任务的性能。我们提出二维结构化与等变表示(称为DUET),即组织成矩阵结构的二维表示,并且对作用于输入数据的变换具有等变性。DUET表示在保持语义表达力的同时,保留了输入变换的信息。与SimCLR(Chen等人,2020)(非结构化且不变性)和ESSL(Dangovski等人,2022)(非结构化且等变性)相比,DUET表示的结构化与等变性特性能够以更低的重建误差实现受控生成,而SimCLR或ESSL无法实现可控性。DUET在多个判别式任务中取得了更高的准确率,并改进了迁移学习性能。