Respiratory illnesses are a significant global health burden. Respiratory illnesses, primarily Chronic obstructive pulmonary disease (COPD), is the seventh leading cause of poor health worldwide and the third leading cause of death worldwide, causing 3.23 million deaths in 2019, necessitating early identification and diagnosis for effective mitigation. Among the diagnostic tools employed, spirometry plays a crucial role in detecting respiratory abnormalities. However, conventional clinical spirometry methods often entail considerable costs and practical limitations like the need for specialized equipment, trained personnel, and a dedicated clinical setting, making them less accessible. To address these challenges, wearable spirometry technologies have emerged as promising alternatives, offering accurate, cost-effective, and convenient solutions. The development of machine learning models for wearable spirometry heavily relies on the availability of high-quality ground truth spirometry data, which is a laborious and expensive endeavor. In this research, we propose using active learning, a sub-field of machine learning, to mitigate the challenges associated with data collection and labeling. By strategically selecting samples from the ground truth spirometer, we can mitigate the need for resource-intensive data collection. We present evidence that models trained on small subsets obtained through active learning achieve comparable/better results than models trained on the complete dataset.
翻译:呼吸系统疾病是全球主要的健康负担。呼吸系统疾病,尤其是慢性阻塞性肺疾病(COPD),是全球健康状况不良的第七大原因和全球第三大死亡原因,2019年导致323万人死亡,因此需要早期识别与诊断以实现有效防控。在现有诊断工具中,肺活量测定在检测呼吸系统异常方面发挥着关键作用。然而,传统临床肺活量测定方法通常成本高昂且存在实际限制,例如需要专用设备、训练有素的人员及专门的临床环境,导致其可及性较低。为应对这些挑战,可穿戴肺活量测定技术已成为具有前景的替代方案,能够提供精准、经济且便捷的解决方案。用于可穿戴肺活量测定的机器学习模型的开发高度依赖于高质量真实肺活量数据的获取,而这是一项耗时且昂贵的工作。本研究提出采用机器学习子领域——主动学习来缓解数据收集与标注相关的挑战。通过策略性地从真实肺活量计中选择样本,我们可以减少对资源密集型数据采集的需求。我们提供的证据表明,通过主动学习获取的小规模子集训练的模型,其性能可达到甚至优于使用完整数据集训练的模型。