A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset

Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions on CANs. Producing vehicular CAN data with a variety of intrusions is out of reach for most researchers as it requires expensive assets and expertise. To assist researchers, we present the first comprehensive guide to the existing open CAN intrusion datasets, including a quality analysis of each dataset and an enumeration of each's benefits, drawbacks, and suggested use case. Current public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, which lack fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but not a corresponding raw binary version. Overall, the available data pigeon-holes CAN IDS works into testing on limited, often inappropriate data (usually with attacks that are too easily detectable to truly test the method), and this lack data has stymied comparability and reproducibility of results. As our primary contribution, we present the ROAD (Real ORNL Automotive Dynamometer) CAN Intrusion Dataset, consisting of over 3.5 hours of one vehicle's CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real fuzzing, fabrication, and unique advanced attacks, as well as simulated masquerade attacks. To facilitate benchmarking CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS field.

翻译：尽管控制器局域网（CAN）在现代车辆中无处不在，但其缺乏基本的安全特性且极易被利用。一个快速增长的车载网络入侵检测（CAN IDS）安全研究领域正致力于检测CAN总线上的入侵行为。由于需要昂贵的硬件设备和专业知识，大多数研究人员难以生成包含多种入侵类型的车载CAN数据。为协助研究者，我们首次系统梳理了现有公开CAN入侵数据集，包括每个数据集的质量分析、优缺点总结及适用场景说明。当前公开的CAN IDS数据集主要局限于真实构造攻击（简单消息注入）和模拟攻击（常基于合成数据），缺乏保真度。总体而言，现有数据集未能验证攻击对车辆造成的物理影响，且仅有一个数据集提供信号解析数据但未保留原始二进制版本。这种数据局限性导致CAN IDS研究被局限在有限且不恰当的数据集上测试（通常攻击特征过于明显，难以真正检验方法有效性），严重阻碍了结果的可比性和可复现性。作为核心贡献，我们提出了ROAD（橡树岭国家实验室真实汽车测功机）CAN入侵数据集，包含同一车辆超3.5小时的CAN总线数据。该数据集收录了多样化驾驶场景下的环境数据，以及涵盖多种变体与实例的渐进式隐蔽攻击——包括真实模糊测试、消息注入攻击、独特高级攻击以及模拟伪装攻击。为支持需要信号解析输入的CAN IDS方法基准测试，我们还提供了多数CAN数据捕获对应的信号时间序列格式。本研究旨在促进CAN IDS领域建立合理的基准测试体系与必要的可比性标准。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日