In disentangled representation learning, a model is asked to tease apart a dataset's underlying sources of variation and represent them independently of one another. Since the model is provided with no ground truth information about these sources, inductive biases take a paramount role in enabling disentanglement. In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. Concretely, we do this by (i) quantizing the latent space into discrete code vectors with a separate learnable scalar codebook per dimension and (ii) applying strong model regularization via an unusually high weight decay. Intuitively, the latent space design forces the encoder to combinatorially construct codes from a small number of distinct scalar values, which in turn enables the decoder to assign a consistent meaning to each value. Regularization then serves to drive the model towards this parsimonious strategy. We demonstrate the broad applicability of this approach by adding it to both basic data-reconstructing (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. For reliable evaluation, we also propose InfoMEC, a new set of metrics for disentanglement that is cohesively grounded in information theory and fixes well-established shortcomings in previous metrics. Together with regularization, latent quantization dramatically improves the modularity and explicitness of learned representations on a representative suite of benchmark datasets. In particular, our quantized-latent autoencoder (QLAE) consistently outperforms strong methods from prior work in these key disentanglement properties without compromising data reconstruction.
翻译:在解耦表示学习中,模型需要分离数据集背后的变化源并将它们独立表示。由于模型未获得关于这些变化源的任何真实标注信息,归纳偏置在实现解耦中起着至关重要的作用。本研究构建了一种面向编码与解码组织化隐空间的归纳偏置。具体而言,我们通过以下方式实现:(i) 将隐空间量化为离散码向量,每个维度使用独立可学习的标量码本;(ii) 通过异常高的权重衰减施加强模型正则化。直观上,隐空间设计迫使编码器从少量不同的标量值中组合式构建编码,从而使解码器能够为每个值赋予一致含义。正则化则驱动模型趋向这种简约策略。我们通过将此方法添加到基础数据重建(标准自编码器)和潜在重建(InfoGAN)生成模型来证明其广泛适用性。为实现可靠评估,我们还提出了 InfoMEC——一组基于信息论统一框架的解耦度量,修正了先前度量中公认的缺陷。结合正则化后,隐变量量化在代表性基准数据集上显著提升了所学表示的模块化程度和明确性。特别地,我们的量化隐变量自编码器(QLAE)在这些关键解耦属性上始终优于先前工作中的强方法,且未牺牲数据重建质量。