Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability to provide information of the molecular structure of the lesion. NIR spectroscopy may provide an alternative source of information to automated CAD of skin lesions. The most commonly used techniques and classification algorithms used in spectroscopy are Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA), and Support Vector Machines (SVM). Nonetheless, there is a growing interest in applying the modern techniques of machine and deep learning (MDL) to spectroscopy. One of the main limitations to apply MDL to spectroscopy is the lack of public datasets. Since there is no public dataset of NIR spectral data to skin lesions, as far as we know, an effort has been made and a new dataset named NIR-SC-UFES, has been collected, annotated and analyzed generating the gold-standard for classification of NIR spectral data to skin cancer. Next, the machine learning algorithms XGBoost, CatBoost, LightGBM, 1D-convolutional neural network (1D-CNN) were investigated to classify cancer and non-cancer skin lesions. Experimental results indicate the best performance obtained by LightGBM with pre-processing using standard normal variate (SNV), feature extraction providing values of 0.839 for balanced accuracy, 0.851 for recall, 0.852 for precision, and 0.850 for F-score. The obtained results indicate the first steps in CAD of skin lesions aiming the automated triage of patients with skin lesions in vivo using NIR spectral data.
翻译:皮肤病变被分类为良性或恶性。其中,黑色素瘤是一种侵袭性极强的恶性肿瘤,也是导致死亡的主要原因。因此,皮肤癌的早期诊断具有重要临床价值。近年来,利用病变图像数据和临床数据的计算机辅助诊断(CAD)技术日益受到关注。然而,这些数据源因无法提供病变分子结构信息而存在局限性。近红外光谱技术可为皮肤病变的自动化CAD提供替代性信息源。光谱分析中常用的技术和分类算法包括主成分分析(PCA)、偏最小二乘判别分析(PLS-DA)及支持向量机(SVM)。尽管如此,将现代机器学习和深度学习(MDL)技术应用于光谱分析的研究兴趣正与日俱增。MDL在光谱分析中应用的主要限制在于缺乏公开数据集。据我们所知,目前尚无公开的皮肤病变近红外光谱数据集。为此,我们已完成一项重要工作:构建了名为NIR-SC-UFES的新型数据集,包含数据采集、标注与分析,建立了皮肤癌近红外光谱数据分类的金标准。随后,我们研究了XGBoost、CatBoost、LightGBM和一维卷积神经网络(1D-CNN)等机器学习算法对癌性与非癌性皮肤病变的分类效果。实验结果表明,采用标准正态变量(SNV)预处理和特征提取的LightGBM模型取得了最佳性能:平衡准确率0.839,召回率0.851,精确率0.852,F值0.850。本研究结果标志着基于近红外光谱数据实现活体皮肤病变自动化分诊的CAD技术迈出了关键第一步。