Location-Based Service (LBS) Data Quality Metrics and Effects on Mobility Inference

Today, GPS-equipped mobile devices are ubiquitous, and they generate Location-Based Service (LBS) data, which has become a critical resource for understanding human mobility. However, inherent limitations in LBS datasets, primarily characterized by discontinuity and sparsity, may introduce significant biases in representing individual movement patterns. This study develops data quality metrics for LBS data, examines their disparities among different populations, and quantifies their effects on inferred individual movement, stays in particular, in the Boston Metropolitan Area. We find that data from higher-income, more educated, and predominantly white census block groups (CBGs) show higher sampling rates but paradoxically lower data quality. This contradiction may stem from greater privacy awareness in these communities. Additionally, we propose a new framework to resample LBS data and quantitatively evaluate the inferential biases associated with data of varying quality. This versatile framework can analyze the impacts originating from different data processing workflows with LBS data. Using linear regression models with clustered standard error, we assess the impact of data quality metrics on inferring the number of stay points. The results show that better data quality, characterized by the number of observations and temporal occupancy, can significantly reduce the bias when calculating the stay points of an individual. The introduction of additional data quality metrics into the regression model can further explain the bias. Overall, this study provides insights into how data quality can influence our understanding of human mobility patterns, highlighting the importance of carefully handling LBS data in research.

翻译：如今，配备GPS的移动设备已无处不在，它们产生的基于位置服务数据已成为理解人类移动性的关键资源。然而，LBS数据集固有的局限性——主要表现为不连续性和稀疏性——可能在表征个体移动模式时引入显著偏差。本研究开发了LBS数据质量指标，检验了不同人群间的指标差异，并量化了这些指标对推断个体移动（特别是停留行为）的影响，研究区域为波士顿大都会区。我们发现，来自更高收入、更高教育水平且以白人为主的普查区块组的数据显示更高的采样率，但数据质量却反而更低。这一矛盾可能源于这些社区更强的隐私保护意识。此外，我们提出了一个新的框架来重采样LBS数据，并定量评估与不同质量数据相关的推断偏差。这个多功能框架可以分析源自不同LBS数据处理流程的影响。通过使用具有聚类标准误的线性回归模型，我们评估了数据质量指标对推断停留点数量的影响。结果表明，以观测数量和时间占用率为特征的更好数据质量，能在计算个体停留点时显著减少偏差。在回归模型中引入额外的数据质量指标可以进一步解释偏差。总体而言，本研究揭示了数据质量如何影响我们对人类移动模式的理解，强调了在研究中谨慎处理LBS数据的重要性。