In disentangled representation learning, a model is asked to tease apart a dataset's underlying sources of variation and represent them independently of one another. Since the model is provided with no ground truth information about these sources, inductive biases take a paramount role in enabling disentanglement. In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. Concretely, we do this by (i) quantizing the latent space into discrete code vectors with a separate learnable scalar codebook per dimension and (ii) applying strong model regularization via an unusually high weight decay. Intuitively, the latent space design forces the encoder to combinatorially construct codes from a small number of distinct scalar values, which in turn enables the decoder to assign a consistent meaning to each value. Regularization then serves to drive the model towards this parsimonious strategy. We demonstrate the broad applicability of this approach by adding it to both basic data-reconstructing (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. For reliable evaluation, we also propose InfoMEC, a new set of metrics for disentanglement that is cohesively grounded in information theory and fixes well-established shortcomings in previous metrics. Together with regularization, latent quantization dramatically improves the modularity and explicitness of learned representations on a representative suite of benchmark datasets. In particular, our quantized-latent autoencoder (QLAE) consistently outperforms strong methods from prior work in these key disentanglement properties without compromising data reconstruction.
翻译:在解耦表示学习中,模型需分离数据集的潜在变异源并独立表示它们。由于模型缺乏这些变异源的监督信息,归纳偏置在实现解耦中起关键作用。为此,我们构建了一种倾向于从结构化潜在空间编码和解码的归纳偏置:具体而言,(i)通过每个维度使用独立可学习的标量码本,将潜在空间量化为离散码向量;(ii)采用异常高的权重衰减进行强模型正则化。直观上,这种潜在空间设计迫使编码器从少量不同标量值中组合构建码本,使解码器能为每个值赋予一致含义;而正则化则驱动模型采用这种简约策略。我们将该方法集成到基础数据重建(标准自编码器)与潜在重建(InfoGAN)生成模型中,验证了其广泛适用性。为可靠评估,我们提出信息论框架下的新解耦度量体系InfoMEC,系统修正了此前度量的固有缺陷。实验表明:潜在量化结合正则化显著提升了基准数据集上学习表示的模块性与明确性。特别地,我们的量化潜在自编码器(QLAE)在关键解耦指标上持续超越先前优秀方法,且未影响数据重建质量。