Early detection of oral cancer and potentially malignant diseases is a major challenge in low-resource settings due to the scarcity of annotated data. We provide a unified approach for four-class oral lesion classification that incorporates deep learning, spectral analysis, and demographic data. A pathologist-verified subset of oral cavity images was curated from a publicly available dataset. Oral cavity pictures were processed using a fine-tuned ConvNeXt-v2 network for deep embeddings before being translated into the hyperspectral domain using a reconstruction algorithm. Haemoglobin-sensitive, textural, and spectral descriptors were obtained from the reconstructed hyperspectral cubes and combined with demographic data. Multiple machine-learning models were evaluated using patient-specific validation. Finally, an incremental heuristic meta-learner (IHML) was developed that merged calibrated base classifiers via probabilistic feature stacking and uncertainty-aware abstraction of multimodal representations with patient-level smoothing. By decoupling evidence extraction from decision fusion, IHML stabilizes predictions in heterogeneous, small-sample medical datasets. On an unseen test set, our proposed model achieved a macro F1 of 66.23% and an overall accuracy of 64.56%. The findings demonstrate that RGB-to-hyperspectral reconstruction and ensemble meta-learning improve diagnostic robustness in real-world oral lesion screening.
翻译:在低资源环境中,由于标注数据稀缺,口腔癌及潜在恶性疾病的早期检测是一项重大挑战。本文提出了一种融合深度学习、光谱分析和人口统计学数据的四类口腔病灶分类统一方法。我们从公开数据集中筛选出经病理学家验证的口腔图像子集。口腔图像首先通过微调的ConvNeXt-v2网络提取深度嵌入特征,随后利用重建算法将其转换至高光谱域。从重建的高光谱立方体中提取血红蛋白敏感特征、纹理特征和光谱描述符,并与人口统计学数据相结合。采用患者特异性验证策略评估了多种机器学习模型。最终,我们提出了一种增量启发式元学习器(IHML),该模型通过概率特征堆叠与多模态表征的不确定性感知抽象,结合患者级平滑技术,融合了经过校准的基础分类器。IHML通过将证据提取与决策融合解耦,稳定了异构小样本医学数据集中的预测结果。在未见测试集上,我们提出的模型取得了66.23%的宏F1分数和64.56%的总准确率。研究结果表明,RGB至高光谱重建与集成元学习方法能够提升实际口腔病灶筛查中的诊断鲁棒性。