Recent statistical methods fitted on large-scale GPS data can provide accurate estimations of the expected travel time between two points. However, little is known about the distribution of travel time, which is key to decision-making across a number of logistic problems. With sufficient data, single road-segment travel time can be well approximated. The challenge lies in understanding how to aggregate such information over a route to arrive at the route-distribution of travel time. We develop a novel statistical approach to this problem. We show that, under general conditions, without assuming a distribution of speed, travel time {divided by route distance follows a Gaussian distribution with route-invariant population mean and variance. We develop efficient inference methods for such parameters and propose asymptotically tight population prediction intervals for travel time. Using traffic flow information, we further develop a trip-specific Gaussian-based predictive distribution, resulting in tight prediction intervals for short and long trips. Our methods, implemented in an R-package, are illustrated in a real-world case study using mobile GPS data, showing that our trip-specific and population intervals both achieve the 95\% theoretical coverage levels. Compared to alternative approaches, our trip-specific predictive distribution achieves (a) the theoretical coverage at every level of significance, (b) tighter prediction intervals, (c) less predictive bias, and (d) more efficient estimation and prediction procedures. This makes our approach promising for low-latency, large-scale transportation applications.
翻译:基于大规模GPS数据的最新统计方法能够准确估计两点之间的期望旅行时间。然而,关于旅行时间的分布特性仍知之甚少,而这恰恰是许多物流问题决策的关键。在有充足数据的情况下,单一道路路段的旅行时间可被良好近似。核心挑战在于如何聚合整条路线的此类信息以得出旅行时间的路线分布。我们针对该问题提出了一种新颖的统计方法。研究表明,在一般条件下,无需假设速度分布,旅行时间除以路线距离服从具有路线不变总体均值与方差的高斯分布。我们开发了针对此类参数的高效推断方法,并提出了旅行时间的渐近紧凑总体预测区间。利用交通流信息,我们进一步构建了基于高斯分布的行程特异性预测分布,从而为短途和长途旅行提供紧凑的预测区间。我们的方法已通过R语言包实现,并利用移动GPS数据在实际案例中进行了验证,结果表明行程特异性预测区间与总体预测区间均达到了95%的理论覆盖水平。与替代方法相比,我们的行程特异性预测分布在以下方面具有优势:(a)在任意显著性水平均能达到理论覆盖;(b)预测区间更紧凑;(c)预测偏差更小;(d)估计与预测流程更高效。这使得我们的方法在大规模低延迟交通应用中具有广阔前景。