Recent studies suggest social media activity can function as a proxy for measures of state-level public health, detectable through natural language processing. We present results of our efforts to apply this approach to estimate homelessness at the state level throughout the US during the period 2010-2019 and 2022 using a dataset of roughly 1 million geotagged tweets containing the substring ``homeless.'' Correlations between homelessness-related tweet counts and ranked per capita homelessness volume, but not general-population densities, suggest a relationship between the likelihood of Twitter users to personally encounter or observe homelessness in their everyday lives and their likelihood to communicate about it online. An increase to the log-odds of ``homeless'' appearing in an English-language tweet, as well as an acceleration in the increase in average tweet sentiment, suggest that tweets about homelessness are also affected by trends at the nation-scale. Additionally, changes to the lexical content of tweets over time suggest that reversals to the polarity of national or state-level trends may be detectable through an increase in political or service-sector language over the semantics of charity or direct appeals. An analysis of user account type also revealed changes to Twitter-use patterns by accounts authored by individuals versus entities that may provide an additional signal to confirm changes to homelessness density in a given jurisdiction. While a computational approach to social media analysis may provide a low-cost, real-time dataset rich with information about nationwide and localized impacts of homelessness and homelessness policy, we find that practical issues abound, limiting the potential of social media as a proxy to complement other measures of homelessness.
翻译:近期研究表明,社交媒体活动可作为州级公共卫生指标的代理信号,并通过自然语言处理技术进行检测。我们运用该方法,基于包含约100万条含"homeless"子串的地理标记推文数据集,估算了2010-2019年及2022年美国各州无家可归现象水平。研究显示,无家可归相关推文数量与人均无家可归人口排名存在相关性,但与总人口密度无关,这表明推特用户在日常亲身遭遇或观察到无家可归现象的可能性与其线上讨论该话题的可能性之间存在关联。英语推文中"homeless"一词出现对数概率的提升,以及推文平均情感倾向加速上扬的特征,表明关于无家可归的推文也受到全国性趋势的影响。此外,推文词汇内容的历时变化显示,当慈善或直接诉求类语义被政治或服务行业用语取代时,可能标志着国家或州级趋势极性发生逆转。对用户账户类型的分析还揭示了个人账户与机构账户使用模式的差异,这或可为特定辖区无家可归人口密度的变化提供额外佐证。尽管基于计算的社会媒体分析方法能提供低成本、实时且信息丰富的数据集,以反映无家可归现象及相应政策在全国和地方层面的影响,但我们发现实际应用层面仍存在诸多障碍,限制了社交媒体作为其他无家可归测量手段补充代理的潜力。