Automated Vehicles (AV) hold potential to reduce or eliminate human driving errors, enhance traffic safety, and support sustainable mobility. Recently, crash data has increasingly revealed that AV behavior can deviate from expected safety outcomes, raising concerns about the technology's safety and operational reliability in mixed traffic environments. While past research has investigated AV crash, most studies rely on small-size California-centered datasets, with a limited focus on understanding crash trends across various SAE Levels of automation. This study analyzes over 2,500 AV crash records from the United States National Highway Traffic Safety Administration (NHTSA), covering SAE Levels 2 and 4, to uncover underlying crash dynamics. A two-stage data mining framework is developed. K-means clustering is first applied to segment crash records into 4 distinct behavioral clusters based on temporal, spatial, and environmental factors. Then, Association Rule Mining (ARM) is used to extract interpretable multivariate relationships between crash patterns and crash contributors including lighting conditions, surface condition, vehicle dynamics, and environmental conditions within each cluster. These insights provide actionable guidance for AV developers, safety regulators, and policymakers in formulating AV deployment strategies and minimizing crash risks.
翻译:自动驾驶车辆(AV)具备减少或消除人为驾驶错误、提升交通安全及支持可持续出行的潜力。然而,近期事故数据日益表明,自动驾驶车辆的行为可能偏离预期的安全结果,引发了对其在混合交通环境中技术安全性与运行可靠性的担忧。尽管已有研究对自动驾驶车辆事故进行了调查,但多数研究依赖于小规模且以加利福尼亚州为中心的数据集,对理解不同SAE自动化等级间事故趋势的关注有限。本研究分析了来自美国国家公路交通安全管理局(NHTSA)的超过2,500条自动驾驶车辆事故记录,涵盖SAE 2级与4级,以揭示潜在的事故动态机制。研究开发了一个两阶段数据挖掘框架:首先应用K-means聚类算法,基于时间、空间及环境因素将事故记录划分为4个不同的行为簇;随后,采用关联规则挖掘(ARM)方法,在各簇内提取事故模式与事故致因因素(包括光照条件、路面状况、车辆动力学及环境条件)之间可解释的多变量关系。这些发现为自动驾驶开发者、安全监管机构及政策制定者制定自动驾驶部署策略、最小化事故风险提供了可操作的指导。