Medical image data are often limited due to the expensive acquisition and annotation process. Hence, training a deep-learning model with only raw data can easily lead to overfitting. One solution to this problem is to augment the raw data with various transformations, improving the model's ability to generalize to new data. However, manually configuring a generic augmentation combination and parameters for different datasets is non-trivial due to inconsistent acquisition approaches and data distributions. Therefore, automatic data augmentation is proposed to learn favorable augmentation strategies for different datasets while incurring large GPU overhead. To this end, we present a novel method, called Dynamic Data Augmentation (DDAug), which is efficient and has negligible computation cost. Our DDAug develops a hierarchical tree structure to represent various augmentations and utilizes an efficient Monte-Carlo tree searching algorithm to update, prune, and sample the tree. As a result, the augmentation pipeline can be optimized for each dataset automatically. Experiments on multiple Prostate MRI datasets show that our method outperforms the current state-of-the-art data augmentation strategies.
翻译:医学图像数据由于采集和标注过程昂贵,通常数量有限。因此,仅使用原始数据训练深度学习模型容易导致过拟合。解决该问题的一种方法是通过多种变换对原始数据进行增强,从而提高模型对新数据的泛化能力。然而,由于不同数据集的采集方法和数据分布不一致,手动为不同数据集配置通用的增强组合与参数并非易事。为此,自动数据增强方法被提出,旨在为不同数据集学习有利的增强策略,但这会带来较大的GPU开销。基于此,我们提出了一种名为动态数据增强(DDAug)的新方法,该方法高效且计算成本可忽略不计。我们的DDAug构建了一个层次化树结构来表示各种增强操作,并利用高效的蒙特卡洛树搜索算法对树进行更新、剪枝和采样。因此,增强流程能够自动针对每个数据集进行优化。在多个前列腺MRI数据集上的实验表明,我们的方法优于当前最先进的数据增强策略。