Early Detection of Tuberculosis with Machine Learning Cough Audio Analysis: Towards More Accessible Global Triaging Usage

Tuberculosis (TB), a bacterial disease mainly affecting the lungs, is one of the leading infectious causes of mortality worldwide. To prevent TB from spreading within the body, which causes life-threatening complications, timely and effective anti-TB treatment is crucial. Cough, an objective biomarker for TB, is a triage tool that monitors treatment response and regresses with successful therapy. Current gold standards for TB diagnosis are slow or inaccessible, especially in rural areas where TB is most prevalent. In addition, current machine learning (ML) diagnosis research, like utilizing chest radiographs, is ineffective and does not monitor treatment progression. To enable effective diagnosis, an ensemble model was developed that analyzes, using a novel ML architecture, coughs' acoustic epidemiologies from smartphones' microphones to detect TB. The architecture includes a 2D-CNN and XGBoost that was trained on 724,964 cough audio samples and demographics from 7 countries. After feature extraction (Mel-spectrograms) and data augmentation (IR-convolution), the model achieved AUROC (area under the receiving operator characteristic) of 88%, surpassing WHO's requirements for screening tests. The results are available within 15 seconds and can easily be accessible via a mobile app. This research helps to improve TB diagnosis through a promising accurate, quick, and accessible triaging tool.

翻译：结核病（TB）是一种主要影响肺部的细菌性疾病，是全球主要的传染性死因之一。为防止结核病在体内扩散并引发危及生命的并发症，及时有效的抗结核治疗至关重要。咳嗽作为结核病的客观生物标志物，是一种分诊工具，可监测治疗反应，并在成功治疗后症状消退。当前结核病诊断的金标准方法要么速度缓慢，要么难以普及，尤其是在结核病高发的农村地区。此外，现有的机器学习（ML）诊断研究（如利用胸部X光片）效率低下，且无法监测治疗进展。为实现高效诊断，本文开发了一种集成模型，该模型采用新型ML架构，通过智能手机麦克风分析咳嗽声学流行病学特征以检测结核病。该架构包含2D-CNN和XGBoost，基于来自7个国家的724,964个咳嗽音频样本和人口统计数据训练而成。在特征提取（梅尔频谱图）和数据增强（脉冲响应卷积）后，模型AUROC（受试者工作特征曲线下面积）达到88%，超过了世界卫生组织对筛查测试的要求。结果可在15秒内获得，并可通过移动应用程序轻松访问。本研究通过提供一种准确、快速且可及的分诊工具，有助于改善结核病诊断。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日