In the medical field, the limited availability of large-scale datasets and labor-intensive annotation processes hinder the performance of deep models. Diffusion-based generative augmentation approaches present a promising solution to this issue, having been proven effective in advancing downstream medical recognition tasks. Nevertheless, existing works lack sufficient semantic and sequential steerability for challenging video/3D sequence generation, and neglect quality control of noisy synthesized samples, resulting in unreliable synthetic databases and severely limiting the performance of downstream tasks. In this work, we present Ctrl-GenAug, a novel and general generative augmentation framework that enables highly semantic- and sequential-customized sequence synthesis and suppresses incorrectly synthesized samples, to aid medical sequence classification. Specifically, we first design a multimodal conditions-guided sequence generator for controllably synthesizing diagnosis-promotive samples. A sequential augmentation module is integrated to enhance the temporal/stereoscopic coherence of generated samples. Then, we propose a noisy synthetic data filter to suppress unreliable cases at semantic and sequential levels. Extensive experiments on 3 medical datasets, using 11 networks trained on 3 paradigms, comprehensively analyze the effectiveness and generality of Ctrl-GenAug, particularly in underrepresented high-risk populations and out-domain conditions.
翻译:在医学领域,大规模数据集的有限可用性以及劳动密集型的标注过程阻碍了深度学习模型的性能。基于扩散模型的生成式增强方法为此问题提供了一个有前景的解决方案,已被证明能有效推动下游医学识别任务的进展。然而,现有工作对于具有挑战性的视频/3D序列生成缺乏足够的语义和时序可控性,并且忽视了对含噪声合成样本的质量控制,导致合成数据库不可靠,严重限制了下游任务的性能。在本工作中,我们提出了Ctrl-GenAug,一个新颖且通用的生成式增强框架,它能够实现高度语义和时序定制的序列合成,并抑制错误合成的样本,以辅助医学序列分类。具体而言,我们首先设计了一个多模态条件引导的序列生成器,用于可控地合成有助于诊断的样本。集成了一个时序增强模块,以提升生成样本在时间/立体维度上的一致性。然后,我们提出了一个噪声合成数据过滤器,在语义和时序层面抑制不可靠的病例。在3个医学数据集上,使用基于3种范式训练的11种网络进行了广泛的实验,全面分析了Ctrl-GenAug的有效性和通用性,特别是在代表性不足的高风险人群和域外条件下的表现。