This study investigates the predictive capacity of environmental, temporal, and spatial factors on traffic accident severity in the United States. Using a dataset of 500,000 U.S. traffic accidents spanning 2016-2023, we trained an XGBoost classifier optimized through randomized search cross-validation and adjusted for class imbalance via class weighting. The final model achieves an overall accuracy of 78%, with strong performance on the majority class (Severity 2), attaining 87% precision and recall. Feature importance analysis reveals that time of day, geographic location, and weather-related variables, including visibility, temperature, and wind speed, rank among the strongest predictors of accident severity. However, contrary to initial hypotheses, precipitation and visibility demonstrate limited predictive power, potentially reflecting behavioral adaptation by drivers under overtly hazardous conditions. The dataset's predominance of mid-level severity accidents constrains the model's capacity to learn meaningful patterns for extreme cases, highlighting the need for alternative sampling strategies, enhanced feature engineering, and integration of external datasets. These findings contribute to evidence-based traffic management and suggest future directions for severity prediction research.
翻译:本研究探讨了环境、时间和空间因素对美国交通事故严重程度的预测能力。利用2016年至2023年间涵盖50万起美国交通事故的数据集,我们训练了一个XGBoost分类器,该模型通过随机搜索交叉验证进行优化,并通过类别加权调整了类别不平衡问题。最终模型的整体准确率达到78%,在多数类别(严重程度2级)上表现强劲,精确率和召回率均达到87%。特征重要性分析表明,一天中的时间、地理位置以及与天气相关的变量(包括能见度、温度和风速)是事故严重程度的最强预测因子之一。然而,与初始假设相反,降水和能见度的预测能力有限,这可能反映了驾驶员在明显危险条件下的行为适应。数据集中以中等严重程度事故为主,限制了模型学习极端案例有意义模式的能力,突显了采用替代抽样策略、增强特征工程以及整合外部数据集的必要性。这些发现为基于证据的交通管理提供了依据,并为严重程度预测研究指明了未来方向。