From health to education, income impacts a huge range of life choices. Earlier research has leveraged data from online social networks to study precisely this impact. In this paper, we ask the opposite question: do different levels of income result in different online behaviors? We demonstrate it does. We present the first large-scale study of Nextdoor, a popular location-based social network. We collect 2.6 Million posts from 64,283 neighborhoods in the United States and 3,325 neighborhoods in the United Kingdom, to examine whether online discourse reflects the income and income inequality of a neighborhood. We show that posts from neighborhoods with different incomes indeed differ, e.g. richer neighborhoods have a more positive sentiment and discuss crimes more, even though their actual crime rates are much lower. We then show that user-generated content can predict both income and inequality. We train multiple machine learning models and predict both income (R-squared=0.841) and inequality (R-squared=0.77).
翻译:从健康到教育,收入深刻影响着生活选择的方方面面。以往研究利用在线社交网络数据精确考察了这种影响。本文则提出反向问题:不同收入水平是否会导致不同的在线行为?我们证明确实如此。作为针对热门位置社交网络Nextdoor的首项大规模研究,我们收集了美国64,283个社区及英国3,325个社区共260万条帖子,考察在线话语是否反映社区收入水平及收入不平等程度。研究表明,不同收入社区的帖子确实存在差异:例如富裕社区虽实际犯罪率远低于贫困社区,但其帖子情感更为积极且更频繁讨论犯罪议题。进一步,用户生成内容可有效预测收入水平与不平等程度。我们训练了多种机器学习模型,收入预测决定系数达到0.841,不平等预测决定系数达0.77。