Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, and data mining. In this article, we comprehensively investigate DRL from various aspects including motivations, definitions, methodologies, evaluations, applications, and model designs. We first present two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition for disentangled representation learning. We further categorize the methodologies for DRL into four groups from the following perspectives, the model type, representation structure, supervision signal, and independence assumption. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.
翻译:解耦表示学习(DRL)旨在学习一种能够以表示形式识别并分离可观测数据中隐藏的潜在因素的模型。将变化的潜在因素分离为具有语义含义的变量,有助于学习可解释的数据表示,这模仿了人类在观察物体或关系时具有意义的理解过程。作为一种通用学习策略,DRL在提升模型可解释性、可控性、鲁棒性及泛化能力方面展现出强大能力,广泛应用于计算机视觉、自然语言处理和数据挖掘等场景。本文从动机、定义、方法、评估、应用及模型设计等多个维度对DRL进行了全面研究。首先介绍了两种广受认可的定义,即解耦表示学习的直观定义和群论定义。进而根据模型类型、表示结构、监督信号和独立性假设四个视角,将DRL方法归类为四组。同时分析了设计不同DRL模型的原则,以适配实际应用中的多样化任务。最后指出了DRL面临的挑战及未来值得探索的研究方向。我们相信,本研究可为推动学术界DRL研究提供启示。