Data mixing, or mixup, is a data-dependent augmentation technique that has greatly enhanced the generalizability of modern deep neural networks. However, a full grasp of mixup methodology necessitates a top-down hierarchical understanding from systematic impartial evaluations and empirical analysis, both of which are currently lacking within the community. In this paper, we present OpenMixup, the first comprehensive mixup benchmarking study for supervised visual classification. OpenMixup offers a unified mixup-based model design and training framework, encompassing a wide collection of data mixing algorithms, a diverse range of widely-used backbones and modules, and a set of model analysis toolkits. To ensure fair and complete comparisons, large-scale standard evaluations of various mixup baselines are conducted across 12 diversified image datasets with meticulous confounders and tweaking powered by our modular and extensible codebase framework. Interesting observations and insights are derived through detailed empirical analysis of how mixup policies, network architectures, and dataset properties affect the mixup visual classification performance. We hope that OpenMixup can bolster the reproducibility of previously gained insights and facilitate a better understanding of mixup properties, thereby giving the community a kick-start for the development and evaluation of new mixup methods. The source code and user documents are available at \url{https://github.com/Westlake-AI/openmixup}.
翻译:数据混合(mixup)是一种数据依赖型增广技术,已显著提升现代深度神经网络的泛化能力。然而,全面理解混合方法需要从系统化公正评估与实证分析两个维度建立自上而下的层次化认知——而这正是当前研究社区所欠缺的。本文提出了OpenMixup——首个面向监督式视觉分类的综合混合数据增广基准研究。OpenMixup提供统一的基于混合的模型设计与训练框架,集成了广泛的数据混合算法、多样化的常用主干网络与模块,以及一套模型分析工具包。为确保公平完整的比较,基于模块化可扩展代码框架,我们细致控制混杂因素并优化调节参数,在12个多样化图像数据集上对多种混合基准方法进行了大规模标准化评估。通过深入分析混合策略、网络架构与数据集属性对混合视觉分类性能的影响,获得了富有洞察力的发现与结论。我们期望OpenMixup能够巩固已有研究结论的可复现性,促进对混合特性更深入的理解,从而为社区开发与评估新型混合方法提供起点。源代码与用户文档已发布于\url{https://github.com/Westlake-AI/openmixup}。