Phobias are common and impairing, and exposure therapy, which involves confronting patients with fear-provoking visual stimuli, is the most effective treatment. Scalable computerized exposure therapy requires automated prediction of fear directly from image content to adapt stimulus selection and treatment intensity. Whether such predictions can be made reliably and generalize across individuals and stimuli, however, remains unknown. Here we show that pretrained convolutional and transformer vision models, adapted via transfer learning, accurately predict group-level perceived fear for spider-related images, even when evaluated on new people and new images, achieving a mean absolute error (MAE) below 10 units on the 0-100 fear scale. Visual explanation analyses indicate that predictions are driven by spider-specific regions in the images. Learning-curve analyses show that transformer models are data efficient and approach performance saturation with the available data (~300 images). Prediction errors increase for very low and very high fear levels and within specific categories of images. These results establish transparent, data-driven fear estimation from images, laying the groundwork for adaptive digital mental health tools.
翻译:恐惧症普遍存在且具有损害性,暴露疗法——即让患者面对引发恐惧的视觉刺激——是最有效的治疗方法。可扩展的计算机化暴露疗法需要直接从图像内容自动预测恐惧,以调整刺激选择和治疗强度。然而,此类预测能否可靠进行并推广至不同个体和刺激,目前尚不明确。本文研究表明,通过迁移学习调整的预训练卷积和Transformer视觉模型,能够准确预测蜘蛛相关图像的群体感知恐惧水平,即使在新受试者和新图像上评估,也能在0-100恐惧量表上实现低于10个单位的平均绝对误差(MAE)。视觉解释分析表明,预测由图像中蜘蛛特定区域驱动。学习曲线分析显示Transformer模型具有数据效率,在现有数据(约300张图像)下接近性能饱和。预测误差在极低和极高恐惧水平以及特定图像类别中会增加。这些结果为基于图像的透明、数据驱动的恐惧评估奠定了基础,为适应性数字心理健康工具的开发铺平了道路。