Accurately detecting home locations from GPS data generated by mobile devices is a foundational step in human mobility research, with significant implications for transportation planning, public health, and emergency response. However, existing home detection algorithms often produce unreliable results for noisy real-world data and are barely validated due to a lack of ground-truth benchmarks. To tackle these limitations, this study presents the development and validation of a Grid-based home detection via Stay-Time (GHOST) algorithm, implemented as an open-source Python package. The algorithm infers proxy home locations by identifying the most frequently visited nighttime or weekend daytime grid cells based on customizable spatial and temporal filters. To validate its performance, we use the large-scale BostonWalks dataset, which includes over 155,000 trips from 377 participants in the Boston metropolitan area, to test robustness to noisy data. Additionally, we collected a ground-truth dataset for ten volunteers across different regions in the U.S., including Florida, Mississippi, and Colorado, along with their self-reported home coordinates, to evaluate GHOST across diverse mobility patterns and sampling conditions. We compared GHOST accuracy to that of 5 well-established home detection algorithms: All-time clustering method, Stay-point method, DBSCAN, K-MEANS++, and SciKit-Mobility Home Detection, across multiple parameter settings. Results show that GHOST outperforms all algorithms in accuracy and robustness, with average errors as low as 22.3 meters under optimal configurations. Our findings highlight the high accuracy and flexibility of our algorithm, with grid size being the most influential parameter during validation, demonstrating the potential of this algorithm for real-world mobile location data analysis.
翻译:准确从移动设备生成的GPS数据中检测家庭位置是人类移动性研究的基础步骤,对交通规划、公共卫生和应急响应具有重要影响。然而,现有家庭检测算法在处理含噪的真实世界数据时往往结果不可靠,且由于缺乏真实基准(ground-truth)验证而难以评估其有效性。为解决这些局限,本研究提出并验证了一种基于网格的停留时间家庭检测(GHOST)算法,并以开源Python包形式实现。该算法通过识别基于可自定义时空滤波器的夜间或周末白天最常访问的网格单元,推断代理家庭位置。为验证其性能,我们利用包含波士顿都市区377名参与者超过15.5万次行程的大规模BostonWalks数据集,测试算法对含噪数据的鲁棒性。此外,我们收集了涵盖美国佛罗里达、密西西比和科罗拉多等不同区域十名志愿者的真实基准数据集及其自报家庭坐标,以评估GHOST在不同移动模式与采样条件下的表现。通过多种参数设置,我们将GHOST的精度与五种成熟的家庭检测算法(全天聚类法、停留点法、DBSCAN、K-MEANS++及SciKit-Mobility家庭检测)进行对比。结果显示,GHOST在精度与鲁棒性方面均优于所有对比算法,在最优配置下平均误差低至22.3米。研究结果凸显了GHOST算法的高精度与灵活性,其中网格尺寸是验证过程中最具影响力的参数,展示了该算法在真实移动位置数据分析中的应用潜力。