Maximum likelihood training has favorable statistical properties and is popular for generative modeling, especially with normalizing flows. On the other hand, generative autoencoders promise to be more efficient than normalizing flows due to the manifold hypothesis. In this work, we introduce successful maximum likelihood training of unconstrained autoencoders for the first time, bringing the two paradigms together. To do so, we identify and overcome two challenges: Firstly, existing maximum likelihood estimators for free-form networks are unacceptably slow, relying on iteration schemes whose cost scales linearly with latent dimension. We introduce an improved estimator which eliminates iteration, resulting in constant cost (roughly double the runtime per batch of a vanilla autoencoder). Secondly, we demonstrate that naively applying maximum likelihood to autoencoders can lead to divergent solutions and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model. We call our model the maximum likelihood autoencoder (MLAE).
翻译:最大似然训练具有优良的统计性质,在生成建模(尤其是标准化流)中广受欢迎。另一方面,基于流形假设的生成式自编码器有望比标准化流更高效。本研究首次成功实现了无约束自编码器的最大似然训练,从而将两种范式有机结合。为此,我们识别并克服了两个挑战:首先,现有针对自由形式网络的最大似然估计器因采用与潜在维度线性相关的迭代方案而速度过慢。我们提出了一种改进的估计器,消除迭代过程,使计算成本保持恒定(每个批次的运行时间约为普通自编码器的两倍)。其次,我们证实将最大似然直接应用于自编码器可能导致发散解,并据此提出了稳定的最大似然训练目标函数。通过在合成数据、表格数据和图像数据上的大量实验,我们验证了该模型的竞争性能。我们将该模型命名为最大似然自编码器(MLAE)。