Walking and cycling are known to bring substantial health, environmental, and economic advantages. However, the development of evidence-based active transportation planning and policies has been impeded by significant data limitations, such as biases in crowdsourced data and representativeness issues of mobile phone data. In this study, we develop and apply a machine learning based modeling approach for estimating daily walking and cycling volumes across a large-scale regional network in New South Wales, Australia that includes 188,999 walking links and 114,885 cycling links. The modeling methodology leverages crowdsourced and mobile phone data as well as a range of other datasets on population, land use, topography, climate, etc. The study discusses the unique challenges and limitations related to all three aspects of model training, testing, and inference given the large geographical extent of the modeled networks and relative scarcity of observed walking and cycling count data. The study also proposes a new technique to identify model estimate outliers and to mitigate their impact. Overall, the study provides a valuable resource for transportation modelers, policymakers and urban planners seeking to enhance active transportation infrastructure planning and policies with advanced emerging data-driven modeling methodologies.
翻译:步行与骑行已被证实能够带来显著的健康、环境及经济效益。然而,基于实证的主动交通规划与政策制定长期受到数据局限性的制约,例如众包数据存在偏差、手机数据代表性不足等问题。本研究开发并应用了一种基于机器学习的建模方法,用于估算澳大利亚新南威尔士州大规模区域网络中每日步行与骑行的交通流量,该网络包含188,999条步行路段和114,885条骑行路段。建模方法整合了众包数据、手机数据以及人口、土地利用、地形、气候等多元数据集。研究探讨了在面对建模网络覆盖范围广阔、观测到的步行与骑行计数数据相对稀缺的情况下,模型训练、测试及推断三个环节各自面临的独特挑战与局限性。此外,本研究还提出了一种识别模型估计异常值并降低其影响的新技术。总体而言,本研究为交通建模人员、政策制定者及城市规划者提供了宝贵资源,助力其借助先进的数据驱动建模方法优化主动交通基础设施规划与政策制定。