A Picture May Be Worth a Thousand Lives: An Interpretable Artificial Intelligence Strategy for Predictions of Suicide Risk from Social Media Images

The promising research on Artificial Intelligence usages in suicide prevention has principal gaps, including black box methodologies, inadequate outcome measures, and scarce research on non-verbal inputs, such as social media images (despite their popularity today, in our digital era). This study addresses these gaps and combines theory-driven and bottom-up strategies to construct a hybrid and interpretable prediction model of valid suicide risk from images. The lead hypothesis was that images contain valuable information about emotions and interpersonal relationships, two central concepts in suicide-related treatments and theories. The dataset included 177,220 images by 841 Facebook users who completed a gold-standard suicide scale. The images were represented with CLIP, a state-of-the-art algorithm, which was utilized, unconventionally, to extract predefined features that served as inputs to a simple logistic-regression prediction model (in contrast to complex neural networks). The features addressed basic and theory-driven visual elements using everyday language (e.g., bright photo, photo of sad people). The results of the hybrid model (that integrated theory-driven and bottom-up methods) indicated high prediction performance that surpassed common bottom-up algorithms, thus providing a first proof that images (alone) can be leveraged to predict validated suicide risk. Corresponding with the lead hypothesis, at-risk users had images with increased negative emotions and decreased belonginess. The results are discussed in the context of non-verbal warning signs of suicide. Notably, the study illustrates the advantages of hybrid models in such complicated tasks and provides simple and flexible prediction strategies that could be utilized to develop real-life monitoring tools of suicide.

翻译：关于人工智能在自杀预防中的应用研究存在主要空白，包括黑箱方法论、不充分的结局指标，以及对非语言输入（如社交媒体图像，尽管在当今数字时代广受欢迎）的匮乏研究。本研究填补了这些空白，结合理论驱动与自下而上的策略，构建了一个基于图像的混合且可解释的自杀风险预测模型。核心假设是：图像包含关于情绪和人际关系的宝贵信息，而这两者是自杀相关治疗和理论中的核心概念。数据集包含来自841名Facebook用户的177,220张图像，这些用户完成了金标准自杀量表。图像通过前沿算法CLIP进行表征，并突破常规地用于提取预定义特征，这些特征作为简单逻辑回归预测模型（区别于复杂神经网络）的输入。这些特征使用日常语言描述基础且理论驱动的视觉元素（例如：明亮的照片、悲伤人物的照片）。混合模型（融合理论驱动与自下而上方法）的结果表明，其预测性能优于常见的自下而上算法，从而首次证明仅凭图像即可预测经验证的自杀风险。与核心假设一致，高风险用户的图像呈现更多负面情绪和更低归属感。本研究在非语言自杀预警信号的背景下讨论了这些结果。值得注意的是，该研究展示了混合模型在此类复杂任务中的优势，并提供了简单灵活的预测策略，可用于开发现实中的自杀监测工具。