Prior research on Twitter (now X) data has provided positive evidence of its utility in developing supplementary health surveillance systems. In this study, we present a new framework to surveil public health, focusing on mental health (MH) outcomes. We hypothesize that locally posted tweets are indicative of local MH outcomes and collect tweets posted from 765 neighborhoods (census block groups) in the USA. We pair these tweets from each neighborhood with the corresponding MH outcome reported by the Center for Disease Control (CDC) to create a benchmark dataset, LocalTweets. With LocalTweets, we present the first population-level evaluation task for Twitter-based MH surveillance systems. We then develop an efficient and effective method, LocalHealth, for predicting MH outcomes based on LocalTweets. When used with GPT3.5, LocalHealth achieves the highest F1-score and accuracy of 0.7429 and 79.78\%, respectively, a 59\% improvement in F1-score over the GPT3.5 in zero-shot setting. We also utilize LocalHealth to extrapolate CDC's estimates to proxy unreported neighborhoods, achieving an F1-score of 0.7291. Our work suggests that Twitter data can be effectively leveraged to simulate neighborhood-level MH outcomes.
翻译:先前关于Twitter(现更名为X)数据的研究已为其在开发辅助健康监控系统方面的实用性提供了积极证据。本研究提出一种新型公共卫生监控框架,重点关注心理健康结局。我们假设本地发布的推文能够反映当地心理健康状况,并收集了美国765个社区(人口普查区块组)发布的推文。我们将每个社区的推文与疾病控制中心报告的对应心理健康结局数据进行配对,构建了基准数据集LocalTweets。基于该数据集,我们首次提出了针对Twitter心理健康监控系统的群体层面评估任务。随后开发了高效精准的LocalHealth方法,用于基于LocalTweets预测心理健康结局。当与GPT3.5结合使用时,LocalHealth取得了最高F1分数0.7429和79.78%的准确率,相比零样本设置下的GPT3.5,F1分数提升59%。我们还利用LocalHealth将CDC的估算结果外推至未报告社区,实现了0.7291的F1分数。本研究表明,Twitter数据可有效用于模拟社区层面的心理健康结局。