This paper considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study utilized functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the datasets were transformed into tensor form. We developed a tensor-based unsupervised machine learning algorithm to reduce the size of the data tensor from $275$ (individuals) $\times 2$ (fMRI and dMRI) $\times 246$ (ROIs) $\times 246$ (ROIs) to $275$ (individuals) $\times 2$ (fMRI and dMRI) $\times 6$ (clusters) $\times 6$ (clusters). This was achieved by applying the high-order Lloyd algorithm to group the ROI data into 6 clusters. Features were extracted from the reduced tensor and combined with demographic features (age, gender, race, and HIV status). The resulting dataset was used to train a Catboost model using subsampling and nested cross-validation techniques, which achieved a prediction accuracy of 0.857 for identifying cocaine users. The model was also compared with other models, and the feature importance of the model was presented. Overall, this study highlights the potential for using tensor-based machine learning algorithms to predict cocaine use based on MRI connectomic data and presents a promising approach for identifying individuals at risk of substance abuse.
翻译:本文探讨了利用机器学习算法基于磁共振成像(MRI)连接组数据预测可卡因使用的研究。研究使用了从275名个体采集的功能性MRI(fMRI)和扩散MRI(dMRI)数据,并利用Brainnetome图谱将这些数据分割为246个感兴趣区域(ROI)。经过数据预处理后,数据集被转换为张量形式。我们开发了一种基于张量的无监督机器学习算法,将数据张量的维度从$275$(个体)$\times 2$(fMRI和dMRI)$\times 246$(ROI)$\times 246$(ROI)缩减至$275$(个体)$\times 2$(fMRI和dMRI)$\times 6$(聚类)$\times 6$(聚类)。这是通过应用高阶Lloyd算法将ROI数据归为6个聚类来实现的。特征从缩减后的张量中提取,并与人口统计特征(年龄、性别、种族和HIV状态)相结合。所得到的数据集用于训练Catboost模型,并采用子采样和嵌套交叉验证技术,该模型在识别可卡因使用者方面达到了0.857的预测准确率。此外,该模型与其他模型进行了对比,并展示了模型中特征的重要性。总体而言,本研究强调了使用基于张量的机器学习算法基于MRI连接组数据预测可卡因使用的潜力,并提出了一种识别物质滥用风险个体的有前景的方法。