A Comprehensive Evaluation of Augmentations for Robust OOD Self-Supervised Contrastive Phonocardiogram Representation Learning

Despite the recent increase in research activity, deep-learning models have not yet been widely accepted in several real-world settings, such as medicine. The shortage of high-quality annotated data often hinders the development of robust and generalizable models, which do not suffer from degraded effectiveness when presented with newly-collected, out-of-distribution (OOD) datasets. Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity, as it takes advantage of unlabeled data to increase model effectiveness and robustness. In this research, we propose applying contrastive SSL for detecting abnormalities in 1D phonocardiogram (PCG) samples by learning a generalized representation of the signal. Specifically, we perform an extensive comparative evaluation of a wide range of audio-based augmentations, evaluate trained classifiers on multiple datasets across different downstream tasks, and finally report on the impact of each augmentation in model training. We experimentally demonstrate that, depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32% when evaluated on unseen data, while SSL models only lose up to 10% or even improve in some cases. We argue and experimentally demonstrate that, contrastive SSL pretraining can assist in providing robust classifiers which can generalize to unseen, OOD data, without relying on time- and labor-intensive annotation processes by medical experts. Furthermore, the proposed extensive evaluation protocol sheds light on the most promising and appropriate augmentations for robust PCG signal processing, by calculating their effect size on model training. Finally, we provide researchers and practitioners with a roadmap towards producing robust models for PCG classification, in addition to an open-source codebase for developing novel approaches.

翻译：尽管近期研究活动有所增加，深度学习模型尚未在医学等现实场景中被广泛接受。高质量标注数据的匮乏常阻碍鲁棒且可泛化模型的开发——这类模型在面对新采集的分布外（OOD）数据集时，不会出现性能退化。对比自监督学习（SSL）为缓解标注数据稀缺提供了潜在解决方案，其通过利用无标签数据提升模型效能与鲁棒性。本研究提出采用对比SSL对一维心音图（PCG）样本进行异常检测，通过学习信号的泛化表征。具体而言，我们系统性地对比评估了多种基于音频的增强方法，在多数据集上评估了面向不同下游任务训练的分类器，并最终报告了每种增强对模型训练的影响。实验表明：全监督模型在新见数据上的效能可能因训练分布不同而下降高达32%，而SSL模型仅损失不超过10%，某些情况下甚至有所提升。我们通过实验论证，对比SSL预训练有助于构建鲁棒分类器，使其能够泛化至未见过的OOD数据，且无需依赖医学专家耗时费力的标注过程。此外，本研究提出的系统性评估方案通过计算增强方法对模型训练的效应量，揭示了最具前景且最适合鲁棒PCG信号处理的增强策略。最后，我们为研究人员与实践者提供了构建鲁棒PCG分类模型的路线图，同时开源了用于开发创新方法的代码库。