This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Initially, our system begins with audio feature extraction using Gammatone and Continuous Wavelet transformation. This step aims to transform the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed system integrates Inception-residual-based backbone models combined with multi-head attention and multi-objective loss to classify respiratory anomalies. Instead of applying a simple concatenation approach by combining results from various spectrograms, we propose a Linear combination, which has the ability to regulate equally the contribution of each individual spectrogram throughout the training process. To evaluate the performance, we conducted experiments over the benchmark dataset of SPRSound (The Open-Source SJTU Paediatric Respiratory Sound) proposed by the IEEE BioCAS 2022 challenge. As regards the Score computed by an average between the average score and harmonic score, our proposed system gained significant improvements of 9.7%, 15.8%, 17.8%, and 16.1% in Task 1-1, Task 1-2, Task 2-1, and Task 2-2, respectively, compared to the challenge baseline system. Notably, we achieved the Top-1 performance in Task 2-1 and Task 2-2 with the highest Score of 74.5% and 53.9%, respectively.
翻译:本文提出一种用于检测呼吸音记录异常的深度学习系统。首先,系统采用Gammatone变换和连续小波变换进行音频特征提取,旨在将呼吸音输入转换为同时呈现频谱特征与时间特征的二维声谱图。其次,所提出的系统整合了基于Inception-Residual的主干网络,结合多头注意力机制与多目标损失函数实现呼吸异常分类。不同于简单拼接多种声谱图结果的方案,我们提出线性组合方法,该方法能够在训练过程中均衡调节每种声谱图的独立贡献。为评估性能,我们在IEEE BioCAS 2022挑战赛提出的SPRSound(上海交通大学开源儿科呼吸音数据集)基准数据集上开展实验。在平均分数与调和分数计算的总分指标上,与挑战赛基线系统相比,本系统在任务1-1、1-2、2-1和2-2中分别获得9.7%、15.8%、17.8%和16.1%的显著提升。值得注意的是,我们在任务2-1和2-2中分别以74.5%和53.9%的最高总分实现Top-1性能。