LAND: A Longitudinal Analysis of Neuromorphic Datasets

Neuromorphic engineering has a data problem. Despite the meteoric rise in the number of neuromorphic datasets published over the past ten years, the conclusion of a significant portion of neuromorphic research papers still states that there is a need for yet more data and even larger datasets. Whilst this need is driven in part by the sheer volume of data required by modern deep learning approaches, it is also fuelled by the current state of the available neuromorphic datasets and the difficulties in finding them, understanding their purpose, and determining the nature of their underlying task. This is further compounded by practical difficulties in downloading and using these datasets. This review starts by capturing a snapshot of the existing neuromorphic datasets, covering over 423 datasets, and then explores the nature of their tasks and the underlying structure of the presented data. Analysing these datasets shows the difficulties arising from their size, the lack of standardisation, and difficulties in accessing the actual data. This paper also highlights the growth in the size of individual datasets and the complexities involved in working with the data. However, a more important concern is the rise of synthetic datasets, created by either simulation or video-to-events methods. This review explores the benefits of simulated data for testing existing algorithms and applications, highlighting the potential pitfalls for exploring new applications of neuromorphic technologies. This review also introduces the concepts of meta-datasets, created from existing datasets, as a way of both reducing the need for more data, and to remove potential bias arising from defining both the dataset and the task.

翻译：神经形态工程面临数据问题。尽管过去十年间发表的神经形态数据集数量呈指数级增长，但大量神经形态研究论文的结论仍指出需要更多、更庞大的数据集。这种需求部分源于现代深度学习方法对海量数据的要求，同时也受到当前可用神经形态数据集现状的制约——这些数据集不仅难以发现、理解其设计目标，更难以界定其底层任务性质。实际下载和使用这些数据集时遇到的技术困难进一步加剧了这一问题。本综述首先对现有神经形态数据集进行全景扫描，涵盖423个以上数据集，继而深入探究其任务本质与数据结构特征。分析表明，这些数据集普遍存在体量过大、缺乏标准化规范、原始数据获取困难等挑战。本文同时指出单个数据集规模的增长趋势及其带来的数据处理复杂性。然而更值得关注的是合成数据集的兴起——它们通过仿真或视频转事件流的方法生成。本综述探讨了仿真数据在测试现有算法与应用方面的优势，同时警示了其在探索神经形态技术新应用场景时可能存在的陷阱。此外，本文还提出了元数据集的概念，即通过对现有数据集进行重构来生成新数据集，这既能缓解对新增数据的需求，又可消除因同时定义数据集与任务而产生的潜在偏差。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

美陆军研究报告《基于熵引导的深度神经网络加速收敛与性能提升方法》最新26页

专知会员服务

17+阅读 · 2025年7月3日

以数据为中心的图学习综述

专知会员服务

43+阅读 · 2024年2月2日

什么神经图数据库？斯坦福Renhongyu博士论文《神经图数据库研究》，207页pdf详述神经图查询技术

专知会员服务

47+阅读 · 2023年9月6日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日