Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses

Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. These methods have many theoretical justifications. Here we introduce a unifying principle rooted in information theory to rederive and generalize existing variational methods and design new ones. We base our framework on an interpretation of the multivariate information bottleneck, in which two Bayesian networks are traded off against one another. We interpret the first network as an encoder graph, which specifies what information to keep when compressing the data. We interpret the second network as a decoder graph, which specifies a generative model for the data. Using this framework, we rederive existing dimensionality reduction methods such as the deep variational information bottleneck (DVIB), beta variational auto-encoders (beta-VAE), and deep variational canonical correlation analysis (DVCCA). The framework naturally introduces a trade-off parameter between compression and reconstruction in the DVCCA family of algorithms, resulting in the new beta-DVCCA family. In addition, we derive a new variational dimensionality reduction method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all of these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. We believe that this framework can be used to unify other multi-view representation learning algorithms. Additionally, it provides a straightforward framework for deriving problem-specific loss functions.

翻译：变分降维方法以其高精度、生成能力和稳健性而著称，这些方法具有丰富的理论依据。本文提出一种基于信息论统一原理的方法，重新推导和推广现有变分方法，并设计新方法。该框架建立在对多变量信息瓶颈的诠释之上，其中两个贝叶斯网络相互权衡：第一个网络被解释为编码器图，指定压缩数据时应保留的信息；第二个网络被解释为解码器图，定义数据的生成模型。利用该框架，我们重新推导了现有的降维方法，如深度变分信息瓶颈（DVIB）、β变分自编码器（β-VAE）和深度变分典型相关分析（DVCCA）。该框架自然地在DVCCA算法族中引入了压缩与重建之间的权衡参数，从而产生新的β-DVCCA族。此外，我们推导出新的变分降维方法——深度变分对称信息瓶颈（DVSIB），该方法同时压缩两个变量以保留其压缩表示之间的信息。我们实现了所有算法，并在改进的含噪MNIST数据集上评估其生成共享低维潜空间的能力。结果表明，与数据结构更匹配的算法（β-DVCCA和DVSIB）能生成更优的潜空间（以分类精度和潜变量维度衡量）。我们认为该框架可统一其他多视图表示学习算法，并为推导特定问题的损失函数提供直接方法。