From health to education, income impacts a huge range of life choices. Many papers have leveraged data from online social networks to study precisely this. In this paper, we ask the opposite question: do different levels of income result in different online behaviors? We demonstrate it does. We present the first large-scale study of Nextdoor, a popular location-based social network. We collect 2.6 Million posts from 64,283 neighborhoods in the United States and 3,325 neighborhoods in the United Kingdom, to examine whether online discourse reflects the income and income inequality of a neighborhood. We show that posts from neighborhoods with different income indeed differ, e.g. richer neighborhoods have a more positive sentiment and discuss crimes more, even though their actual crime rates are much lower. We then show that user-generated content can predict both income and inequality. We train multiple machine learning models and predict both income (R-Square=0.841) and inequality (R-Square=0.77).
翻译:从健康到教育,收入水平深刻影响着人们生活中的诸多抉择。已有大量研究借助在线社交网络数据探讨这一议题。本文则提出反向追问:收入差异是否会导致不同的在线行为模式?我们通过实证证明这一关系确实存在。作为首个针对位置社交网络Nextdoor的大规模研究,我们收集了美国64,283个社区和英国3,325个社区的260万条帖子,系统考察在线话语是否能够反映社区收入水平及收入不平等程度。研究显示,不同收入水平社区发布的帖子确实存在显著差异:例如高收入社区的帖子情感倾向更积极,且尽管其实际犯罪率远低于低收入社区,其讨论犯罪话题的频率反而更高。进一步分析表明,用户生成内容能够有效预测社区收入水平与收入不平等状况。我们训练了多种机器学习模型,对社区收入水平的预测决定系数(R平方)达到0.841,对收入不平等程度的预测决定系数达到0.77。