Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning

Despite recent advancements in deep learning, its application in real-world medical settings, such as phonocardiogram (PCG) classification, remains limited. A significant barrier is the lack of high-quality annotated datasets, which hampers the development of robust, generalizable models that can perform well on newly collected, out-of-distribution (OOD) data. Self-Supervised Learning (SSL) contrastive learning, has shown promise in mitigating the issue of data scarcity by using unlabeled data to enhance model robustness. Even though SSL methods have been proposed and researched in other domains, works focusing on the impact of data augmentations on model robustness for PCG classification are limited. In particular, while augmentations are a key component in SSL, selecting the most suitable policy during training is highly challenging. Improper augmentations can lead to substantial performance degradation and even hinder a network's ability to learn meaningful representations. Addressing this gap, our research aims to explore and evaluate a wide range of audio-based augmentations and uncover combinations that enhance SSL model performance in PCG classification. We conduct a comprehensive comparative analysis across multiple datasets, assessing the impact of various augmentations on model performance. Our findings reveal that depending on the training distribution, augmentation choice significantly influences model robustness, with fully-supervised models experiencing up to a 32\% drop in effectiveness when evaluated on unseen data, while SSL models demonstrate greater resilience, losing only 10\% or even improving in some cases. This study also highlights the most promising and appropriate augmentations for PCG signal processing, by calculating their effect size on training. These insights equip researchers with valuable guidelines for developing reliable models in PCG signal processing.

翻译：尽管深度学习近期取得了显著进展，其在真实医疗场景（如心音图分类）中的应用仍十分有限。一个主要障碍在于高质量标注数据的缺乏，这阻碍了开发能够在新收集的分布外数据上表现优异的鲁棒、可泛化模型。自监督学习中的对比学习通过利用未标注数据增强模型鲁棒性，在缓解数据稀缺问题方面展现出潜力。尽管自监督学习方法已在其他领域得到广泛研究，但聚焦于数据增强对心音图分类模型鲁棒性影响的工作仍较为有限。值得注意的是，数据增强作为自监督学习的核心组件，在训练过程中选择最优策略极具挑战性。不恰当的增强方法可能导致性能显著下降，甚至阻碍网络学习有意义的表示。为填补这一研究空白，本研究旨在系统探索和评估多种基于音频的数据增强方法，并挖掘能提升自监督学习模型在心音图分类中性能的组合策略。我们通过跨多数据集的全面对比分析，评估了不同增强方法对模型性能的影响。研究结果表明：根据训练数据分布的特性，增强策略的选择会显著影响模型鲁棒性——全监督模型在未见数据上的评估效果可能下降高达32%，而自监督学习模型展现出更强的适应性，仅损失10%的性能或在某些情况下甚至有所提升。本研究还通过计算训练效应量，揭示了心音图信号处理中最具潜力且适宜的增强方法。这些发现为研究人员开发可靠的心音图信号处理模型提供了重要指导。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日