On the use of LLMs to generate a dataset of Neural Networks

Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.

翻译：神经网络正日益广泛地应用于决策支持。为验证其可靠性与适应性，研究人员和实践者已提出多种用于神经网络代码验证、重构和迁移等任务的工具与方法。这些工具在保障神经网络架构的正确性与可维护性方面发挥着关键作用，有助于防止实现错误、简化模型更新，并确保复杂网络能够被可靠地扩展与复用。然而，由于缺乏可用于系统评估的公开多样化神经网络数据集，评估这些工具的有效性仍具挑战性。为填补这一空白，本研究利用大语言模型自动生成可作为验证基准的神经网络数据集。该数据集设计涵盖多样化的架构组件，并能处理多种输入数据类型与任务。总计生成608个样本，每个样本均符合一组精确的设计规范。为进一步确保其一致性，我们通过静态分析与符号追踪对生成网络的正确性进行了验证。本数据集已公开提供，以支持学界推进神经网络可靠性与适应性的相关研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【ETHZ博士论文】神经网络训练与认证，101页pdf

专知会员服务

20+阅读 · 2024年7月28日

【普林斯顿博士论文】理解数据在模型决策中的作用

专知会员服务

42+阅读 · 2024年4月26日

神经网络如何推理算法？DeepMind Petar等LoG 2022 《神经算法推理》教程，系统性讲解神经网络与经典算法结合

专知会员服务

31+阅读 · 2022年12月22日

【苏黎世联邦理工博士论文】深度神经网络的鲁棒性与正则化，233页pdf

专知会员服务

48+阅读 · 2022年11月4日