Efficiency of ETA Prediction

Modern mobile applications such as navigation services and ride-sharing platforms rely heavily on geospatial technologies, most critically predictions of the time required for a vehicle to traverse a particular route, or the so-called estimated time of arrival (ETA). There are various methods used in practice, which differ in terms of the geographic granularity at which the predictive model is trained -- e.g., segment-based methods predict travel time at the level of road segments (or a combination of several adjacent road segments) and then aggregate across the route, whereas route-based methods use generic information about the trip, such as origin and destination, to predict travel time. Though various forms of these methods have been developed, there has been no rigorous theoretical comparison regarding their accuracies, and empirical studies have, in many cases, drawn opposite conclusions. We provide the first theoretical analysis of the predictive accuracy of various ETA prediction methods. In a finite-sample setting, we give mild conditions under which a segment-based method is more accurate than a wide variety of route-based methods. Then we analyze an asymptotic setting in which the number of trip observations grows with the size of the road network. Under a broad range of trip-generating processes on a grid network, we show that a class of very simple segment-based methods is at least as good, up to a logarithmic factor, as any possible predictor. In other words, segment-based methods are asymptotically optimal up to a logarithmic factor. Our work highlights that the accuracy of ETA prediction is driven not just by the sophistication of the model but also by the spatial granularity at which those methods are applied.

翻译：现代移动应用（如导航服务和网约车平台）高度依赖地理空间技术，其中最核心的是预测车辆通过特定路线所需时间，即所谓的预计到达时间（ETA）。实践中存在多种预测方法，这些方法在预测模型训练时所采用的地理粒度上存在差异。例如，基于路段的方法在路段级别（或结合多个相邻路段）预测行驶时间，然后汇总整个路线的结果；而基于路线的方法则利用行程的通用信息（如起点和终点）来预测行驶时间。尽管这些方法已有多种形式的发展，但关于其预测精度的严格理论比较尚属空白，而实证研究在许多情况下得出了相反的结论。我们首次对多种ETA预测方法的预测精度进行了理论分析。在有限样本场景下，我们给出了温和条件，证明基于路段的方法在精度上优于广泛类别的基于路线的方法。随后，我们分析了一个渐近场景，其中行程观测数量随道路网络规模增长。在网格网络上的一系列广泛行程生成过程中，我们证明了一类非常简单的基于路段的方法至少与任何可能的预测器性能相当（仅差一个对数因子）。换言之，基于路段的方法在渐近意义上达到了最优（仅差一个对数因子）。我们的工作强调，ETA预测的精度不仅由模型的复杂程度决定，还受到这些方法应用时所采用的空间粒度的影响。