Monitoring of prevalent airborne diseases such as COVID-19 characteristically involves respiratory assessments. While auscultation is a mainstream method for preliminary screening of disease symptoms, its utility is hampered by the need for dedicated hospital visits. Remote monitoring based on recordings of respiratory sounds on portable devices is a promising alternative, which can assist in early assessment of COVID-19 that primarily affects the lower respiratory tract. In this study, we introduce a novel deep learning approach to distinguish patients with COVID-19 from healthy controls given audio recordings of cough or breathing sounds. The proposed approach leverages a novel hierarchical spectrogram transformer (HST) on spectrogram representations of respiratory sounds. HST embodies self-attention mechanisms over local windows in spectrograms, and window size is progressively grown over model stages to capture local to global context. HST is compared against state-of-the-art conventional and deep-learning baselines. Demonstrations on crowd-sourced multi-national datasets indicate that HST outperforms competing methods, achieving over 83% area under the receiver operating characteristic curve (AUC) in detecting COVID-19 cases.
翻译:对COVID-19等常见空气传播疾病的监测典型地涉及呼吸道评估。虽然听诊是疾病症状初步筛查的主流方法,但其效用受限于需要专门就医。基于便携设备记录呼吸道声音的远程监测是一种有前景的替代方案,可辅助主要影响下呼吸道的COVID-19的早期评估。本研究提出一种新型深度学习方法,通过咳嗽或呼吸声音的音频记录区分COVID-19患者与健康对照者。该方法在呼吸道声音的频谱图表示上采用新型层次化频谱图Transformer(HST)。HST在频谱图的局部窗口上实现自注意力机制,并在模型各阶段逐步增大窗口尺寸以捕获从局部到全局的上下文信息。HST与最先进的传统及深度学习基线方法进行了对比。在多国众包数据集上的验证表明,HST优于竞争方法,在检测COVID-19病例时受试者工作特征曲线下面积(AUC)超过83%。