Recent label mix-based augmentation methods have shown their effectiveness in generalization despite their simplicity, and their favorable effects are often attributed to semantic-level augmentation. However, we found that they are vulnerable to highly skewed class distribution, because scarce data classes are rarely sampled for inter-class perturbation. We propose TextManiA, a text-driven manifold augmentation method that semantically enriches visual feature spaces, regardless of data distribution. TextManiA augments visual data with intra-class semantic perturbation by exploiting easy-to-understand visually mimetic words, i.e., attributes. To this end, we bridge between the text representation and a target visual feature space, and propose an efficient vector augmentation. To empirically support the validity of our design, we devise two visualization-based analyses and show the plausibility of the bridge between two different modality spaces. Our experiments demonstrate that TextManiA is powerful in scarce samples with class imbalance as well as even distribution. We also show compatibility with the label mix-based approaches in evenly distributed scarce data.
翻译:近期基于标签混合的增强方法尽管简单,却在泛化性能上展现出显著效果,其有效性常被归因于语义级增强。然而我们发现,这类方法在面对高度偏斜的类别分布时表现脆弱,这是因为稀有数据类别的类间扰动采样概率极低。本文提出TextManiA——一种不受数据分布限制、通过文本驱动流形增强方法实现视觉特征空间语义丰富化的技术。TextManiA利用易理解的可视化拟态词(即属性词)对视觉数据进行类内语义扰动增强。为此,我们构建了文本表示与目标视觉特征空间之间的桥梁,并提出了高效的向量增强策略。为从实证角度验证设计合理性,我们设计了两种基于可视化的分析方法,展示了不同模态空间之间桥梁构建的可行性。实验表明,TextManiA在处理类别不均衡的稀缺样本及均衡分布场景中均表现优异,同时证明了该方法在均衡分布的稀缺数据中与标签混合方法具有兼容性。