Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, Multimodal Geometric Augmentation (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on 2D synthetic datasets and segmentation on real 3D brain magnetic resonance images (MRIs). We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at https://github.com/tonmoy-hossain/MGAug.
翻译:几何变换已被广泛用于扩充训练图像规模。现有方法通常假设图像间底层变换服从单模态分布,这在面临多模态分布数据时限制了其效能。本文提出新型模型——多模态几何增强(MGAug),首次在几何变形的多模态潜在空间中生成增强变换。为实现该目标,我们首先开发深度网络,将微分同胚变换(即微分同胚)的几何潜在空间学习嵌入变分自编码器(VAE)。在微分同胚的切空间中构建多元高斯混合模型,作为近似图像变换隐含分布的先验。随后,通过从学习到的VAE多模态潜在空间中随机采样变换来变形原始训练图像。为验证模型有效性,我们将增强策略与两个不同领域任务联合学习:2D合成数据集上的多类分类与真实3D脑部磁共振图像(MRI)的分割。同时将MGAug与最新基于变换的图像增强算法进行对比。实验结果表明,所提方法在预测精度上显著优于所有基线方法。我们的代码已开源至https://github.com/tonmoy-hossain/MGAug。