Socio-economic indicators provide context for assessing a country's overall condition. These indicators contain information about education, gender, poverty, employment, and other factors. Therefore, reliable and accurate information is critical for social research and government policing. Most data sources available today, such as censuses, have sparse population coverage or are updated infrequently. Nonetheless, alternative data sources, such as call data records (CDR) and mobile app usage, can serve as cost-effective and up-to-date sources for identifying socio-economic indicators. This work investigates mobile app data to predict socio-economic features. We present a large-scale study using data that captures the traffic of thousands of mobile applications by approximately 30 million users distributed over 550,000 km square and served by over 25,000 base stations. The dataset covers the whole France territory and spans more than 2.5 months, starting from 16th March 2019 to 6th June 2019. Using the app usage patterns, our best model can estimate socio-economic indicators (attaining an R-squared score upto 0.66). Furthermore, using models' explainability, we discover that mobile app usage patterns have the potential to reveal socio-economic disparities in IRIS. Insights of this study provide several avenues for future interventions, including user temporal network analysis to understand evolving network patterns and exploration of alternative data sources.
翻译:社会经济指标为评估一个国家整体状况提供了背景信息。这些指标包含教育、性别、贫困、就业等多方面因素。因此,可靠且准确的信息对社会研究和政府决策至关重要。现有的大多数数据源(例如人口普查)存在人口覆盖稀疏或更新频率低的问题。然而,替代性数据源(如通话详细记录和移动应用使用数据)可作为低成本且实时更新的来源,用于识别社会经济指标。本研究探索利用移动应用数据预测社会经济特征。我们开展了一项大规模研究,所使用的数据记录了约3000万用户在超过55万平方公里区域内的数千款移动应用流量,这些用户由超过2.5万个基站提供服务。该数据集覆盖法国全境,时间跨度从2019年3月16日至2019年6月6日,共计超过2.5个月。基于应用使用模式,我们的最佳模型能够估算社会经济指标(R平方得分最高达0.66)。此外,通过模型的可解释性分析,我们发现移动应用使用模式具有揭示IRIS层面社会经济差异的潜力。本研究洞察为未来干预措施提供了多项路径,包括通过用户时序网络分析理解不断演化的网络模式,以及探索其他替代性数据源。