Disentanglement via Latent Quantization

In disentangled representation learning, a model is asked to tease apart a dataset's underlying sources of variation and represent them independently of one another. Since the model is provided with no ground truth information about these sources, inductive biases take a paramount role in enabling disentanglement. In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. Concretely, we do this by (i) quantizing the latent space into discrete code vectors with a separate learnable scalar codebook per dimension and (ii) applying strong model regularization via an unusually high weight decay. Intuitively, the latent space design forces the encoder to combinatorially construct codes from a small number of distinct scalar values, which in turn enables the decoder to assign a consistent meaning to each value. Regularization then serves to drive the model towards this parsimonious strategy. We demonstrate the broad applicability of this approach by adding it to both basic data-reconstructing (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. For reliable evaluation, we also propose InfoMEC, a new set of metrics for disentanglement that is cohesively grounded in information theory and fixes well-established shortcomings in previous metrics. Together with regularization, latent quantization dramatically improves the modularity and explicitness of learned representations on a representative suite of benchmark datasets. In particular, our quantized-latent autoencoder (QLAE) consistently outperforms strong methods from prior work in these key disentanglement properties without compromising data reconstruction.

翻译：在解耦表征学习中，模型需分离数据集中的潜在变异源，并使其彼此独立表征。由于模型未获得关于这些变异源的任何真实标注信息，归纳偏置在实现解耦中扮演着关键作用。本研究构建了一种向有序潜在空间进行编码和解码的归纳偏置。具体而言，我们通过以下方式实现：（i）将潜在空间量化为离散码向量，并为每个维度设置独立可学习的标量码本；（ii）通过异常高的权重衰减施加强模型正则化。直觉上，这种潜在空间设计迫使编码器从少量不同标量值中组合性地构建编码，进而使解码器能为每个值赋予一致的含义。正则化则推动模型趋向这种简约策略。我们通过将该方法分别应用于基础数据重构（普通自编码器）和潜在重构（InfoGAN）生成模型，证明了其广泛的适用性。为实现可靠评估，我们还提出了InfoMEC——一套基于信息论统一框架的解耦新指标，弥补了先前指标中存在的固有问题。在代表性基准数据集上，结合正则化的潜在量化显著提升了习得表示的模块化程度和显式性。特别地，我们的量化潜在自编码器（QLAE）在保持数据重构质量的前提下，在关键解耦特性上持续超越先前工作中的强基线方法。