A Bi-level Framework for Traffic Accident Duration Prediction: Leveraging Weather and Road Condition Data within a Practical Optimum Pipeline

Due to the stochastic nature of events, predicting the duration of a traffic incident presents a formidable challenge. Accurate duration estimation can result in substantial advantages for commuters in selecting optimal routes and for traffic management personnel in addressing non-recurring congestion issues. In this study, we gathered accident duration, road conditions, and meteorological data from a database of traffic accidents to check the feasibility of a traffic accident duration pipeline without accident contextual information data like accident severity and textual description. Multiple machine learning models were employed to predict whether an accident's impact on road traffic would be of a short-term or long-term nature, and then utilizing a bimodal approach the precise duration of the incident's effect was determined. Our binary classification random forest model distinguished between short-term and long-term effects with an 83% accuracy rate, while the LightGBM regression model outperformed other machine learning regression models with Mean Average Error (MAE) values of 26.15 and 13.3 and RMSE values of 32.91 and 28.91 for short and long-term accident duration prediction, respectively. Using the optimal classification and regression model identified in the preceding section, we then construct an end-to-end pipeline to incorporate the entire process. The results of both separate and combined approaches were comparable with previous works, which shows the applicability of only using static features for predicting traffic accident duration. The SHAP value analysis identified weather conditions, wind chill and wind speed as the most influential factors in determining the duration of an accident.

翻译：由于事件的随机性特征，预测交通事件的持续时间是一项严峻挑战。准确的持续时间估算能为通勤者选择最优路线以及交通管理人员应对非常规拥堵问题带来显著优势。本研究从交通事故数据库中收集了事故持续时间、路况及气象数据，旨在检验无需事故严重程度和文本描述等事故情境信息的交通事件持续时间预测管线的可行性。我们采用多种机器学习模型预测事故对道路交通的影响属于短期还是长期属性，进而通过双模态方法确定事件影响的具体持续时间。二元分类随机森林模型区分短期与长期影响的准确率达83%，而LightGBM回归模型在短期与长期事故持续时间预测中均优于其他机器学习回归模型：平均绝对误差(MAE)分别为26.15和13.3，均方根误差(RMSE)分别为32.91和28.91。基于前序环节确定的最优分类与回归模型，我们构建了端到端管线以整合完整流程。分离方法与组合方法的实验结果均与既往研究相当，表明仅使用静态特征预测交通事故持续时间的可行性。SHAP值分析表明，天气状况、风寒指数和风速是决定事故持续时间的最关键因素。