Melanoma detection is vital for early diagnosis and effective treatment. While deep learning models on dermoscopic images have shown promise, they require specialized equipment, limiting their use in broader clinical settings. This study introduces a multi-modal melanoma detection system using conventional photo images, making it more accessible and versatile. Our system integrates image data with tabular metadata, such as patient demographics and lesion characteristics, to improve detection accuracy. It employs a multi-modal neural network combining image and metadata processing and supports a two-step model for cases with or without metadata. A three-stage pipeline further refines predictions by boosting algorithms and enhancing performance. To address the challenges of a highly imbalanced dataset, specific techniques were implemented to ensure robust training. An ablation study evaluated recent vision architectures, boosting algorithms, and loss functions, achieving a peak Partial ROC AUC of 0.18068 (0.2 maximum) and top-15 retrieval sensitivity of 0.78371. Results demonstrate that integrating photo images with metadata in a structured, multi-stage pipeline yields significant performance improvements. This system advances melanoma detection by providing a scalable, equipment-independent solution suitable for diverse healthcare environments, bridging the gap between specialized and general clinical practices.
翻译:黑色素瘤检测对于早期诊断和有效治疗至关重要。虽然基于皮肤镜图像的深度学习模型已展现出潜力,但这些模型需要专用设备,限制了其在更广泛临床环境中的应用。本研究提出了一种使用常规照片图像的多模态黑色素瘤检测系统,使其更具可及性和通用性。该系统将图像数据与表格元数据(如患者人口统计学特征和病变特征)相结合,以提高检测准确性。系统采用融合图像与元数据处理的多模态神经网络,并支持包含或不包含元数据情况下的两步模型。通过集成提升算法和增强性能的三阶段流程进一步优化预测结果。针对高度不平衡数据集的挑战,本研究实施了特定技术以确保稳健训练。消融实验评估了近期视觉架构、提升算法及损失函数,实现了0.18068的峰值部分ROC AUC(最大值为0.2)和0.78371的Top-15检索敏感度。结果表明,在结构化多阶段流程中整合照片图像与元数据能带来显著的性能提升。该系统通过提供适用于多样化医疗环境的可扩展、设备无关的解决方案,在专业与普通临床实践之间架起桥梁,推动了黑色素瘤检测技术的发展。