Cough-based diagnosis for Respiratory Diseases (RDs) using Artificial Intelligence (AI) has attracted considerable attention, yet many existing studies overlook confounding variables in their predictive models. These variables can distort the relationship between cough recordings (input data) and RD status (output variable), leading to biased associations and unrealistic model performance. To address this gap, we propose the Bias Free Network (RBFNet), an end to end solution that effectively mitigates the impact of confounders in the training data distribution. RBFNet ensures accurate and unbiased RD diagnosis features, emphasizing its relevance by incorporating a COVID19 dataset in this study. This approach aims to enhance the reliability of AI based RD diagnosis models by navigating the challenges posed by confounding variables. A hybrid of a Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks is proposed for the feature encoder module of RBFNet. An additional bias predictor is incorporated in the classification scheme to formulate a conditional Generative Adversarial Network (cGAN) which helps in decorrelating the impact of confounding variables from RD prediction. The merit of RBFNet is demonstrated by comparing classification performance with State of The Art (SoTA) Deep Learning (DL) model (CNN LSTM) after training on different unbalanced COVID-19 data sets, created by using a large scale proprietary cough data set. RBF-Net proved its robustness against extremely biased training scenarios by achieving test set accuracies of 84.1%, 84.6%, and 80.5% for the following confounding variables gender, age, and smoking status, respectively. RBF-Net outperforms the CNN-LSTM model test set accuracies by 5.5%, 7.7%, and 8.2%, respectively
翻译:利用人工智能(AI)进行呼吸系统疾病(RDs)的咳嗽诊断已引起广泛关注,但现有许多研究在预测模型中忽略了混杂变量。这些变量可能扭曲咳嗽录音(输入数据)与呼吸系统疾病状态(输出变量)之间的关联,导致有偏的关联性及不切实际的模型性能。为解决这一问题,我们提出了无偏网络(RBFNet),这是一种端到端解决方案,能有效减轻训练数据分布中混杂因素的影响。RBFNet确保生成准确且无偏的呼吸系统疾病诊断特征,并通过纳入COVID-19数据集突显其相关性。该方法旨在克服混杂变量带来的挑战,提升基于AI的呼吸系统疾病诊断模型的可靠性。在RBFNet的特征编码模块中,我们提出了一种混合卷积神经网络(CNN)和长短期记忆网络(LSTM)的结构。分类方案中额外引入了一个偏置预测器,构建条件生成对抗网络(cGAN),从而解耦混杂变量与呼吸系统疾病预测的相关性。通过在不同非平衡COVID-19数据集(基于大规模专有咳嗽数据集构建)上对比RBFNet与当前最优(SoTA)深度学习(DL)模型(CNN-LSTM)的分类性能,验证了RBFNet的优势。RBF-Net在极端有偏训练场景下展现出鲁棒性:针对性别、年龄和吸烟状态等混杂变量,测试集准确率分别达到84.1%、84.6%和80.5%,相比CNN-LSTM模型分别提升了5.5%、7.7%和8.2%。