In recent years, passively collected GPS data have been popularly applied in various transportation studies, such as highway performance monitoring, travel behavior analysis, and travel demand estimation. Despite multiple advantages, one of the issues is data oscillations (aka outliers or data jumps), which are unneglectable since they may distort mobility patterns and lead to wrongly or biased conclusions. For transportation studies driven by GPS data, assuring the data quality by removing noises caused by data oscillations is undoubtedly important. Most GPS-based studies simply remove oscillations by checking the high speed. However, this method can mistakenly identify normal points as oscillations. Some other studies specifically discuss the removal of outliers in GPS data, but they all have limitations and do not fit passively collected GPS data. Many studies are well developed for addressing the ping-pong phenomenon in cellular data, or cellular tower data, but the oscillations in passively collected GPS data are very different for having much more various and complicated patterns and being more uncertain. Current methods are insufficient and inapplicable to passively collected GPS data. This paper aims to address the oscillated points in passively collected GPS data. A set of heuristics are proposed by identifying the abnormal movement patterns of oscillations. The proposed heuristics well fit the features of passively collected GPS data and are adaptable to studies of different scales, which are also computationally cost-effective in comparison to current methods.
翻译:近年来,被动采集的GPS数据已被广泛应用于各类交通研究,例如高速公路性能监测、出行行为分析及出行需求估计。尽管具有多重优势,数据振荡(亦称异常值或数据跳变)问题不可忽视,因其可能扭曲移动模式并导致错误或偏倚的结论。对于基于GPS数据的交通研究而言,通过消除数据振荡引起的噪声来确保数据质量无疑至关重要。多数基于GPS的研究仅通过检测高速率来剔除振荡点,但该方法可能误将正常点识别为振荡。部分研究专门探讨了GPS数据中异常值的消除,但这些方法均存在局限性,且不适用于被动采集的GPS数据。针对蜂窝数据或基站数据中的"乒乓效应"问题已有成熟研究,但被动采集的GPS数据中的振荡模式更为多样复杂且不确定性更高,现有方法不足以应对此类数据。本文旨在解决被动采集的GPS数据中的振荡点问题。通过识别振荡的异常移动模式,提出一组启发式规则。所提规则充分适配被动采集的GPS数据特征,可适应不同规模的研究,且与现有方法相比具有计算成本效益。