Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions on CANs. Producing vehicular CAN data with a variety of intrusions is out of reach for most researchers as it requires expensive assets and expertise. To assist researchers, we present the first comprehensive guide to the existing open CAN intrusion datasets, including a quality analysis of each dataset and an enumeration of each's benefits, drawbacks, and suggested use case. Current public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, which lack fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but not a corresponding raw binary version. Overall, the available data pigeon-holes CAN IDS works into testing on limited, often inappropriate data (usually with attacks that are too easily detectable to truly test the method), and this lack data has stymied comparability and reproducibility of results. As our primary contribution, we present the ROAD (Real ORNL Automotive Dynamometer) CAN Intrusion Dataset, consisting of over 3.5 hours of one vehicle's CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real fuzzing, fabrication, and unique advanced attacks, as well as simulated masquerade attacks. To facilitate benchmarking CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS field.
翻译:尽管控制器局域网(CAN)在现代车辆中无处不在,但其缺乏基本的安全特性且极易被利用。一个快速增长的车载网络入侵检测(CAN IDS)安全研究领域正致力于检测CAN总线上的入侵行为。由于需要昂贵的硬件设备和专业知识,大多数研究人员难以生成包含多种入侵类型的车载CAN数据。为协助研究者,我们首次系统梳理了现有公开CAN入侵数据集,包括每个数据集的质量分析、优缺点总结及适用场景说明。当前公开的CAN IDS数据集主要局限于真实构造攻击(简单消息注入)和模拟攻击(常基于合成数据),缺乏保真度。总体而言,现有数据集未能验证攻击对车辆造成的物理影响,且仅有一个数据集提供信号解析数据但未保留原始二进制版本。这种数据局限性导致CAN IDS研究被局限在有限且不恰当的数据集上测试(通常攻击特征过于明显,难以真正检验方法有效性),严重阻碍了结果的可比性和可复现性。作为核心贡献,我们提出了ROAD(橡树岭国家实验室真实汽车测功机)CAN入侵数据集,包含同一车辆超3.5小时的CAN总线数据。该数据集收录了多样化驾驶场景下的环境数据,以及涵盖多种变体与实例的渐进式隐蔽攻击——包括真实模糊测试、消息注入攻击、独特高级攻击以及模拟伪装攻击。为支持需要信号解析输入的CAN IDS方法基准测试,我们还提供了多数CAN数据捕获对应的信号时间序列格式。本研究旨在促进CAN IDS领域建立合理的基准测试体系与必要的可比性标准。