Predicting the duration of traffic incidents for Sydney greater metropolitan area using machine learning methods

This research presents a comprehensive approach to predicting the duration of traffic incidents and classifying them as short-term or long-term across the Sydney Metropolitan Area. Leveraging a dataset that encompasses detailed records of traffic incidents, road network characteristics, and socio-economic indicators, we train and evaluate a variety of advanced machine learning models including Gradient Boosted Decision Trees (GBDT), Random Forest, LightGBM, and XGBoost. The models are assessed using Root Mean Square Error (RMSE) for regression tasks and F1 score for classification tasks. Our experimental results demonstrate that XGBoost and LightGBM outperform conventional models with XGBoost achieving the lowest RMSE of 33.7 for predicting incident duration and highest classification F1 score of 0.62 for a 30-minute duration threshold. For classification, the 30-minute threshold balances performance with 70.84\% short-term duration classification accuracy and 62.72\% long-term duration classification accuracy. Feature importance analysis, employing both tree split counts and SHAP values, identifies the number of affected lanes, traffic volume, and types of primary and secondary vehicles as the most influential features. The proposed methodology not only achieves high predictive accuracy but also provides stakeholders with vital insights into factors contributing to incident durations. These insights enable more informed decision-making for traffic management and response strategies. The code is available by the link: https://github.com/Future-Mobility-Lab/SydneyIncidents

翻译：本研究提出了一种综合方法，用于预测悉尼大都市区交通事件的持续时间，并将其分类为短期或长期事件。利用一个包含交通事件详细记录、路网特征和社会经济指标的数据集，我们训练并评估了多种先进的机器学习模型，包括梯度提升决策树（GBDT）、随机森林、LightGBM 和 XGBoost。模型通过均方根误差（RMSE）用于回归任务，F1 分数用于分类任务进行评估。我们的实验结果表明，XGBoost 和 LightGBM 优于传统模型，其中 XGBoost 在预测事件持续时间方面取得了最低的 RMSE（33.7），并在 30 分钟持续时间阈值的分类任务中取得了最高的 F1 分数（0.62）。对于分类任务，30 分钟的阈值在性能上取得了平衡，短期持续时间分类准确率为 70.84%，长期持续时间分类准确率为 62.72%。通过采用树分裂计数和 SHAP 值的特征重要性分析，确定受影响车道数、交通流量以及主次涉事车辆类型为最具影响力的特征。所提出的方法不仅实现了较高的预测准确性，还为利益相关者提供了关于影响事件持续时间因素的重要见解。这些见解有助于为交通管理和响应策略做出更明智的决策。代码可通过以下链接获取：https://github.com/Future-Mobility-Lab/SydneyIncidents