In the era of big data, the ubiquity of location-aware portable devices provides an unprecedented opportunity to understand inhabitants' behavior and their interactions with the built environments. Among the widely used data resources, mobile phone data is the one passively collected and has the largest coverage in the population. However, mobile operators cannot pinpoint one user within meters, leading to the difficulties in activity inference. To that end, we propose a data analysis framework to identify user's activity via coupling the mobile phone data with location-based social networks (LBSN) data. The two datasets are integrated into a Bayesian inference module, considering people's circadian rhythms in both time and space. Specifically, the framework considers the pattern of arrival time to each type of facility and the spatial distribution of facilities. The former can be observed from the LBSN Data and the latter is provided by the points of interest (POIs) dataset. Taking Shanghai as an example, we reconstruct the activity chains of 1,000,000 active mobile phone users and analyze the temporal and spatial characteristics of each activity type. We assess the results with some official surveys and a real-world check-in dataset collected in Shanghai, indicating that the proposed method can capture and analyze human activities effectively. Next, we cluster users' inferred activity chains with a topic model to understand the behavior of different groups of users. This data analysis framework provides an example of reconstructing and understanding the activity of the population at an urban scale with big data fusion.
翻译:在大数据时代,无处不在的便携式定位设备为了解居民行为及其与建成环境的交互提供了前所未有的机遇。在广泛使用的数据资源中,手机数据是被动采集且覆盖人口最广的数据源。然而,移动运营商无法将用户精确定位至米级范围,这给活动推断带来了困难。为此,我们提出了一种数据分析框架,通过耦合手机数据与基于位置的社会网络(LBSN)数据来识别用户活动。这两个数据集被整合到一个贝叶斯推断模块中,同时考虑了人们在时间和空间上的昼夜节律。具体而言,该框架综合分析了各类设施的到达时间模式及其空间分布特征——前者可从LBSN数据中观测,后者则由兴趣点(POI)数据集提供。以上海为例,我们重建了100万名活跃手机用户的活动链,并分析了各类活动的时间与空间特征。通过官方调查数据及上海市真实签到数据集评估,结果表明该方法能有效捕获并分析人类活动。进而,我们采用主题模型对用户推断出的活动链进行聚类,以理解不同用户群体的行为模式。该数据分析框架为通过大数据融合实现城市尺度人口活动重建与理解提供了范例。