An AI-enabled Bias-Free Respiratory Disease Diagnosis Model using Cough Audio: A Case Study for COVID-19

Cough-based diagnosis for Respiratory Diseases (RDs) using Artificial Intelligence (AI) has attracted considerable attention, yet many existing studies overlook confounding variables in their predictive models. These variables can distort the relationship between cough recordings (input data) and RD status (output variable), leading to biased associations and unrealistic model performance. To address this gap, we propose the Bias Free Network (RBFNet), an end to end solution that effectively mitigates the impact of confounders in the training data distribution. RBFNet ensures accurate and unbiased RD diagnosis features, emphasizing its relevance by incorporating a COVID19 dataset in this study. This approach aims to enhance the reliability of AI based RD diagnosis models by navigating the challenges posed by confounding variables. A hybrid of a Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks is proposed for the feature encoder module of RBFNet. An additional bias predictor is incorporated in the classification scheme to formulate a conditional Generative Adversarial Network (cGAN) which helps in decorrelating the impact of confounding variables from RD prediction. The merit of RBFNet is demonstrated by comparing classification performance with State of The Art (SoTA) Deep Learning (DL) model (CNN LSTM) after training on different unbalanced COVID-19 data sets, created by using a large scale proprietary cough data set. RBF-Net proved its robustness against extremely biased training scenarios by achieving test set accuracies of 84.1%, 84.6%, and 80.5% for the following confounding variables gender, age, and smoking status, respectively. RBF-Net outperforms the CNN-LSTM model test set accuracies by 5.5%, 7.7%, and 8.2%, respectively

翻译：利用人工智能（AI）进行呼吸系统疾病（RDs）的咳嗽诊断已引起广泛关注，但现有许多研究在预测模型中忽略了混杂变量。这些变量可能扭曲咳嗽录音（输入数据）与呼吸系统疾病状态（输出变量）之间的关联，导致有偏的关联性及不切实际的模型性能。为解决这一问题，我们提出了无偏网络（RBFNet），这是一种端到端解决方案，能有效减轻训练数据分布中混杂因素的影响。RBFNet确保生成准确且无偏的呼吸系统疾病诊断特征，并通过纳入COVID-19数据集突显其相关性。该方法旨在克服混杂变量带来的挑战，提升基于AI的呼吸系统疾病诊断模型的可靠性。在RBFNet的特征编码模块中，我们提出了一种混合卷积神经网络（CNN）和长短期记忆网络（LSTM）的结构。分类方案中额外引入了一个偏置预测器，构建条件生成对抗网络（cGAN），从而解耦混杂变量与呼吸系统疾病预测的相关性。通过在不同非平衡COVID-19数据集（基于大规模专有咳嗽数据集构建）上对比RBFNet与当前最优（SoTA）深度学习（DL）模型（CNN-LSTM）的分类性能，验证了RBFNet的优势。RBF-Net在极端有偏训练场景下展现出鲁棒性：针对性别、年龄和吸烟状态等混杂变量，测试集准确率分别达到84.1%、84.6%和80.5%，相比CNN-LSTM模型分别提升了5.5%、7.7%和8.2%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日