Modern learning frameworks often train deep neural networks with massive amounts of unlabeled data to learn representations by solving simple pretext tasks, then use the representations as foundations for downstream tasks. These networks are empirically designed; as such, they are usually not interpretable, their representations are not structured, and their designs are potentially redundant. White-box deep networks, in which each layer explicitly identifies and transforms structures in the data, present a promising alternative. However, existing white-box architectures have only been shown to work at scale in supervised settings with labeled data, such as classification. In this work, we provide the first instantiation of the white-box design paradigm that can be applied to large-scale unsupervised representation learning. We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation. Extensive empirical evaluations confirm our analytical insights. CRATE-MAE demonstrates highly promising performance on large-scale imagery datasets while using only ~30% of the parameters compared to the standard masked autoencoder with the same model configuration. The representations learned by CRATE-MAE have explicit structure and also contain semantic meaning. Code is available at https://github.com/Ma-Lab-Berkeley/CRATE .
翻译:现代学习框架通常利用大量无标签数据训练深度神经网络,通过解决简单的预文本任务学习表征,并将这些表征作为下游任务的基础。这类网络基于经验设计,因此通常缺乏可解释性,其表征缺乏结构化,且设计可能存在冗余。白盒深度网络(其中每一层显式识别并转换数据中的结构)提供了一种有前景的替代方案。然而,现有白盒架构仅在监督场景(如分类任务)中大规模验证有效。在本工作中,我们首次实例化了可应用于大规模无监督表征学习的白盒设计范式。通过挖掘扩散、压缩与(掩码)补全之间的基本联系,我们推导出一种类似Transformer的深度掩码自编码器架构——CRATE-MAE,其中每层的作用在数学上完全可解释:它们将数据分布转换至结构化表征,或反向转换。大量实验验证了我们的理论分析。CRATE-MAE在大规模图像数据集上展现出极具竞争力的性能,同时其参数量仅为同等配置标准掩码自编码器的约30%。该架构学得的表征既具有显式结构,又蕴含语义信息。代码开源地址:https://github.com/Ma-Lab-Berkeley/CRATE。