There has been a growing interest in recent years in modelling multiple modalities (or views) of data to for example, understand the relationship between modalities or to generate missing data. Multi-view autoencoders have gained significant traction for their adaptability and versatility in modelling multi-modal data, demonstrating an ability to tailor their approach to suit the characteristics of the data at hand. However, most multi-view autoencoders have inconsistent notation and are often implemented using different coding frameworks. To address this, we present a unified mathematical framework for multi-view autoencoders, consolidating their formulations. Moreover, we offer insights into the motivation and theoretical advantages of each model. To facilitate accessibility and practical use, we extend the documentation and functionality of the previously introduced \texttt{multi-view-AE} library. This library offers Python implementations of numerous multi-view autoencoder models, presented within a user-friendly framework. Through benchmarking experiments, we evaluate our implementations against previous ones, demonstrating comparable or superior performance. This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field.
翻译:近年来,对数据多模态(或多视图)建模的兴趣日益增长,例如用于理解模态间关系或生成缺失数据。多视图自编码器因其在建模多模态数据时的适应性和多功能性而受到广泛关注,展现出可根据数据特征定制方法的能力。然而,大多数多视图自编码器符号表示不一致,且常基于不同的编码框架实现。为此,我们提出了一个统一的多视图自编码器数学框架,整合了其数学表述。此外,我们深入探讨了各模型的动机与理论优势。为了提升可访问性和实用价值,我们扩展了此前介绍的 \texttt{multi-view-AE} 库的文档和功能。该库以用户友好的框架提供了多种多视图自编码器模型的Python实现。通过基准实验,我们评估了实现方案与先前工作的性能对比,结果表明其具有相当或更优的表现。本研究旨在为多模态建模奠定统一基础,成为该领域具有重要价值的教育资源。