Recent label mix-based augmentation methods have shown their effectiveness in generalization despite their simplicity, and their favorable effects are often attributed to semantic-level augmentation. However, we found that they are vulnerable to highly skewed class distribution, because scarce data classes are rarely sampled for inter-class perturbation. We propose TextManiA, a text-driven manifold augmentation method that semantically enriches visual feature spaces, regardless of data distribution. TextManiA augments visual data with intra-class semantic perturbation by exploiting easy-to-understand visually mimetic words, i.e., attributes. To this end, we bridge between the text representation and a target visual feature space, and propose an efficient vector augmentation. To empirically support the validity of our design, we devise two visualization-based analyses and show the plausibility of the bridge between two different modality spaces. Our experiments demonstrate that TextManiA is powerful in scarce samples with class imbalance as well as even distribution. We also show compatibility with the label mix-based approaches in evenly distributed scarce data.
翻译:最近的基于标签混合的数据增强方法尽管简单,但在泛化方面展现出显著效果,其积极影响常被归因于语义层面的增强。然而,我们发现这些方法容易受到高度偏斜的类别分布影响,因为稀缺数据类别很少被采样用于类间扰动。为此,我们提出TextManiA——一种文本驱动的流形增强方法,能够不受数据分布限制地从语义层面丰富视觉特征空间。该方法通过利用易于理解的视觉模拟词(即属性)实现类内语义扰动来增强视觉数据。具体而言,我们在文本表示与目标视觉特征空间之间建立桥梁,并提出一种高效的向量增强方案。为实证支持设计合理性,我们设计了两种基于可视化的分析,展示了不同模态空间之间桥梁的可行性。实验表明,TextManiA在类别不平衡的稀缺样本和均匀分布场景下均表现优越。我们还证明了该方法在均匀分布的稀缺数据中与基于标签混合的方法具有兼容性。