We propose a composable framework for latent space image augmentation that allows for easy combination of multiple augmentations. Image augmentation has been shown to be an effective technique for improving the performance of a wide variety of image classification and generation tasks. Our framework is based on the Variational Autoencoder architecture and uses a novel approach for augmentation via linear transformation within the latent space itself. We explore losses and augmentation latent geometry to enforce the transformations to be composable and involuntary, thus allowing the transformations to be readily combined or inverted. Finally, we show these properties are better performing with certain pairs of augmentations, but we can transfer the latent space to other sets of augmentations to modify performance, effectively constraining the VAE's bottleneck to preserve the variance of specific augmentations and features of the image which we care about. We demonstrate the effectiveness of our approach with initial results on the MNIST dataset against both a standard VAE and a Conditional VAE. This latent augmentation method allows for much greater control and geometric interpretability of the latent space, making it a valuable tool for researchers and practitioners in the field.
翻译:我们提出一种可组合的潜空间图像增强框架,该框架支持多种增强操作的简便组合。图像增强已被证明是改善图像分类与生成任务性能的有效技术。我们的框架基于变分自编码器架构,并采用一种在潜空间内部通过线性变换实现增强的新方法。通过探索损失函数与增强潜空间几何特性,我们迫使变换具有可组合性与非自主性,从而使得变换能够被便捷地组合或逆操作。最后,我们证明这些特性在某些增强组合下表现更优,但能将潜空间迁移至其他增强集合以调整性能,从而有效约束VAE的瓶颈以保留我们关注的特定增强与图像特征的方差。通过在MNIST数据集上与标准VAE及条件VAE的初步对比结果,我们展示了该方法的有效性。这种潜增强方法赋予潜空间更强的可控性与几何可解释性,使其成为领域研究人员与实践者的宝贵工具。