Respiratory disease, the third leading cause of deaths globally, is considered a high-priority ailment requiring significant research on identification and treatment. Stethoscope-recorded lung sounds and artificial intelligence-powered devices have been used to identify lung disorders and aid specialists in making accurate diagnoses. In this study, audio-spectrogram vision transformer (AS-ViT), a new approach for identifying abnormal respiration sounds, was developed. The sounds of the lungs are converted into visual representations called spectrograms using a technique called short-time Fourier transform (STFT). These images are then analyzed using a model called vision transformer to identify different types of respiratory sounds. The classification was carried out using the ICBHI 2017 database, which includes various types of lung sounds with different frequencies, noise levels, and backgrounds. The proposed AS-ViT method was evaluated using three metrics and achieved 79.1% and 59.8% for 60:40 split ratio and 86.4% and 69.3% for 80:20 split ratio in terms of unweighted average recall and overall scores respectively for respiratory sound detection, surpassing previous state-of-the-art results.
翻译:呼吸系统疾病作为全球第三大致死病因,被视为需要重点研究的优先防治疾病。听诊器记录的肺音与人工智能辅助设备已被用于识别肺部疾病,帮助专家做出精准诊断。本研究提出音频-频谱图视觉Transformer(AS-ViT),一种用于识别异常呼吸音的新方法。通过短时傅里叶变换(STFT)技术,肺音被转换为称为频谱图的视觉表征,进而利用视觉Transformer模型分析这些图像以识别不同类型的呼吸音。基于ICBHI 2017数据库(包含不同频率、噪声水平及背景的多类肺音数据)进行分类实验。采用三项评估指标对提出的AS-ViT方法进行评价:在60:40数据拆分比例下,未加权平均召回率和总体得分分别达到79.1%和59.8%;在80:20拆分比例下分别达到86.4%和69.3%,均超越此前最优结果。