Predicting the duration of traffic incidents for Sydney greater metropolitan area using machine learning methods

This research presents a comprehensive approach to predicting the duration of traffic incidents and classifying them as short-term or long-term across the Sydney Metropolitan Area. Leveraging a dataset that encompasses detailed records of traffic incidents, road network characteristics, and socio-economic indicators, we train and evaluate a variety of advanced machine learning models including Gradient Boosted Decision Trees (GBDT), Random Forest, LightGBM, and XGBoost. The models are assessed using Root Mean Square Error (RMSE) for regression tasks and F1 score for classification tasks. Our experimental results demonstrate that XGBoost and LightGBM outperform conventional models with XGBoost achieving the lowest RMSE of 33.7 for predicting incident duration and highest classification F1 score of 0.62 for a 30-minute duration threshold. For classification, the 30-minute threshold balances performance with 70.84% short-term duration classification accuracy and 62.72% long-term duration classification accuracy. Feature importance analysis, employing both tree split counts and SHAP values, identifies the number of affected lanes, traffic volume, and types of primary and secondary vehicles as the most influential features. The proposed methodology not only achieves high predictive accuracy but also provides stakeholders with vital insights into factors contributing to incident durations. These insights enable more informed decision-making for traffic management and response strategies. The code is available by the link: https://github.com/Future-Mobility-Lab/SydneyIncidents

翻译：本研究提出了一种综合方法，用于预测悉尼大都市区交通事件的持续时间，并将其分类为短期或长期事件。利用包含交通事件详细记录、路网特征和社会经济指标的数据集，我们训练并评估了多种先进的机器学习模型，包括梯度提升决策树（GBDT）、随机森林、LightGBM和XGBoost。模型通过均方根误差（RMSE）评估回归任务性能，通过F1分数评估分类任务性能。实验结果表明，XGBoost和LightGBM优于传统模型，其中XGBoost在事件持续时间预测中取得了最低的RMSE（33.7），在30分钟持续时间阈值的分类任务中取得了最高的F1分数（0.62）。对于分类任务，30分钟阈值在性能间取得了平衡，短期事件分类准确率为70.84%，长期事件分类准确率为62.72%。通过采用树分裂计数和SHAP值的特征重要性分析，确定受影响车道数量、交通流量以及主次涉事车辆类型为最具影响力的特征。所提出的方法不仅实现了较高的预测精度，还为利益相关者提供了关于影响事件持续时间关键因素的重要见解。这些见解有助于为交通管理和应急响应策略制定更明智的决策。代码可通过以下链接获取：https://github.com/Future-Mobility-Lab/SydneyIncidents