In the realm of medical imaging, the training of machine learning models necessitates a large and varied training dataset to ensure robustness and interoperability. However, acquiring such diverse and heterogeneous data can be difficult due to the need for expert labeling of each image and privacy concerns associated with medical data. To circumvent these challenges, data augmentation has emerged as a promising and cost-effective technique for increasing the size and diversity of the training dataset. In this study, we provide a comprehensive review of the specific data augmentation techniques employed in medical imaging and explore their benefits. We conducted an in-depth study of all data augmentation techniques used in medical imaging, identifying 11 different purposes and collecting 65 distinct techniques. The techniques were operationalized into spatial transformation-based, color and contrast adjustment-based, noise-based, deformation-based, data mixing-based, filters and mask-based, division-based, multi-scale and multi-view-based, and meta-learning-based categories. We observed that some techniques require manual specification of all parameters, while others rely on automation to adjust the type and magnitude of augmentation based on task requirements. The utilization of these techniques enables the development of more robust models that can be applied in domains with limited or challenging data availability. It is expected that the list of available techniques will expand in the future, providing researchers with additional options to consider.
翻译:在医学成像领域,机器学习模型的训练需要大规模且多样化的训练数据集,以确保模型的鲁棒性和互操作性。然而,由于每张影像需要专家标注以及医学数据相关的隐私问题,获取如此多样化和异质性的数据往往面临困难。为应对这些挑战,数据增强技术作为一种经济高效的手段应运而生,用于扩大训练数据集的规模与多样性。本研究系统综述了医学成像中使用的具体数据增强技术,并探讨了其优势。我们深入研究了医学成像领域所有数据增强技术,识别出11种不同目的,并收集了65种独特技术。这些技术被归为基于空间变换、基于颜色与对比度调整、基于噪声、基于变形、基于数据混合、基于滤波器与掩膜、基于分割、基于多尺度与多视图、以及基于元学习等类别。我们观察到,部分技术需要手动指定所有参数,而另一些则依赖自动化根据任务需求调整增强的类型与幅度。这些技术的应用有助于开发更鲁棒的模型,使其能够应用于数据有限或获取困难的领域。预计未来可用技术列表将持续扩展,为研究者提供更多可选的方案。