A tutorial on multi-view autoencoders using the multi-view-AE library

There has been a growing interest in recent years in modelling multiple modalities (or views) of data to for example, understand the relationship between modalities or to generate missing data. Multi-view autoencoders have gained significant traction for their adaptability and versatility in modelling multi-modal data, demonstrating an ability to tailor their approach to suit the characteristics of the data at hand. However, most multi-view autoencoders have inconsistent notation and are often implemented using different coding frameworks. To address this, we present a unified mathematical framework for multi-view autoencoders, consolidating their formulations. Moreover, we offer insights into the motivation and theoretical advantages of each model. To facilitate accessibility and practical use, we extend the documentation and functionality of the previously introduced \texttt{multi-view-AE} library. This library offers Python implementations of numerous multi-view autoencoder models, presented within a user-friendly framework. Through benchmarking experiments, we evaluate our implementations against previous ones, demonstrating comparable or superior performance. This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field.

翻译：近年来，对数据多模态（或多视图）建模的兴趣日益增长，例如用于理解模态间关系或生成缺失数据。多视图自编码器因其在建模多模态数据时的适应性和多功能性而受到广泛关注，展现出可根据数据特征定制方法的能力。然而，大多数多视图自编码器符号表示不一致，且常基于不同的编码框架实现。为此，我们提出了一个统一的多视图自编码器数学框架，整合了其数学表述。此外，我们深入探讨了各模型的动机与理论优势。为了提升可访问性和实用价值，我们扩展了此前介绍的 \texttt{multi-view-AE} 库的文档和功能。该库以用户友好的框架提供了多种多视图自编码器模型的Python实现。通过基准实验，我们评估了实现方案与先前工作的性能对比，结果表明其具有相当或更优的表现。本研究旨在为多模态建模奠定统一基础，成为该领域具有重要价值的教育资源。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日