Accurate classification of sleep disorders, particularly insomnia and sleep apnea, is important for reducing long term health risks and improving patient quality of life. However, clinical sleep studies are resource intensive and are difficult to scale for population level screening. This paper presents a Dual Pipeline Machine Learning Framework for multi class sleep disorder screening using the Sleep Health and Lifestyle dataset. The framework consists of two parallel processing streams: a statistical pipeline that targets linear separability using Mutual Information and Linear Discriminant Analysis, and a wrapper based pipeline that applies Boruta feature selection with an autoencoder for non linear representation learning. To address class imbalance, we use the hybrid SMOTETomek resampling strategy. In experiments, Extra Trees and K Nearest Neighbors achieved an accuracy of 98.67%, outperforming recent baselines on the same dataset. Statistical testing using the Wilcoxon Signed Rank Test indicates that the improvement over baseline configurations is significant, and inference latency remains below 400 milliseconds. These results suggest that the proposed dual pipeline design supports accurate and efficient automated screening for non invasive sleep disorder risk stratification.
翻译:准确分类睡眠障碍(特别是失眠与睡眠呼吸暂停)对于降低长期健康风险、改善患者生活质量具有重要意义。然而,临床睡眠研究资源密集,难以扩展至人群级筛查。本文提出一种基于睡眠健康与生活方式数据集的双管道机器学习框架,用于多类别睡眠障碍筛查。该框架包含两个并行处理流:一个统计管道,利用互信息与线性判别分析以针对线性可分性;以及一个基于包装法的管道,应用Boruta特征选择并结合自编码器进行非线性表示学习。为处理类别不平衡问题,我们采用混合SMOTETomek重采样策略。在实验中,Extra Trees与K近邻算法取得了98.67%的准确率,优于同一数据集上的近期基线方法。基于Wilcoxon符号秩检验的统计测试表明,相较于基线配置的改进具有显著性,且推理延迟保持在400毫秒以下。这些结果表明,所提出的双管道设计能够支持准确、高效的非侵入式睡眠障碍风险分层自动化筛查。