Men who have sex with men (MSM) are at elevated risk for sexually transmitted infections and harmful drinking compared to heterosexual men. Text data collected from social media and dating applications may provide new opportunities for personalized public health interventions by enabling automatic identification of risk and protective behaviors. In this study, we evaluated whether text from social media and dating apps can be used to predict sexual risk behaviors, alcohol use, and pre-exposure prophylaxis (PrEP) uptake among MSM. With participant consent, we collected textual data and trained machine learning models using features derived from ChatGPT embeddings, BERT embeddings, LIWC, and a dictionary-based risk term approach. The models achieved strong performance in predicting monthly binge drinking and having more than five sexual partners, with F1 scores of 0.78, and moderate performance in predicting PrEP use and heavy drinking, with F1 scores of 0.64 and 0.63. These findings demonstrate that social media and dating app text data can provide valuable insights into risk and protective behaviors and highlight the potential of large language model-based methods to support scalable and personalized public health interventions for MSM.
翻译:相较于异性恋男性,男男性行为者(MSM)面临更高的性传播感染风险及有害饮酒行为。通过社交媒体和交友应用收集的文本数据,为自动识别风险与保护行为提供了新机遇,有助于实现个性化的公共卫生干预。本研究评估了社交媒体与交友应用中的文本数据是否可用于预测MSM群体的性风险行为、酒精使用及暴露前预防(PrEP)采纳情况。在参与者知情同意前提下,我们收集了文本数据,并利用基于ChatGPT嵌入向量、BERT嵌入向量、LIWC词典及基于词典的风险术语方法提取特征,训练了机器学习模型。该模型在预测月度酗酒行为及拥有超过五名性伴侣方面表现优异(F1分数达0.78),在预测PrEP使用及重度饮酒行为方面表现中等(F1分数分别为0.64和0.63)。研究结果表明,社交媒体与交友应用的文本数据能够为风险与保护行为提供有价值的洞察,同时凸显了基于大语言模型的方法在支持可扩展、个性化MSM公共卫生干预方面的潜力。