Variational autoencoder (VAE) is an established generative model but is notorious for its blurriness. In this work, we investigate the blurry output problem of VAE and resolve it, exploiting the variance of Gaussian decoder and $\beta$ of beta-VAE. Specifically, we reveal that the indistinguishability of decoder variance and $\beta$ hinders appropriate analysis of the model by random likelihood value, and limits performance improvement by omitting the gain from $\beta$. To address the problem, we propose Beta-Sigma VAE (BS-VAE) that explicitly separates $\beta$ and decoder variance $\sigma^2_x$ in the model. Our method demonstrates not only superior performance in natural image synthesis but also controllable parameters and predictable analysis compared to conventional VAE. In our experimental evaluation, we employ the analysis of rate-distortion curve and proxy metrics on computer vision datasets. The code is available on https://github.com/overnap/BS-VAE
翻译:变分自编码器(VAE)是一种成熟的生成模型,但因其生成结果模糊而备受诟病。本研究针对VAE的输出模糊问题,通过利用高斯解码器的方差与beta-VAE中的$\beta$参数,提出了一种解决方案。具体而言,我们发现解码器方差与$\beta$的不可区分性阻碍了通过随机似然值对模型进行恰当分析,并因忽略$\beta$带来的增益而限制了性能提升。为解决该问题,我们提出了Beta-Sigma VAE(BS-VAE),在模型中显式分离$\beta$与解码器方差$\sigma^2_x$。相较于传统VAE,我们的方法不仅在自然图像合成中表现出更优性能,同时具备可控参数与可预测的分析特性。在实验评估中,我们采用率失真曲线分析与计算机视觉数据集上的代理指标进行评估。代码公开于https://github.com/overnap/BS-VAE。