Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
翻译:神经网络正日益广泛地应用于决策支持。为验证其可靠性与适应性,研究人员和实践者已提出多种用于神经网络代码验证、重构和迁移等任务的工具与方法。这些工具在保障神经网络架构的正确性与可维护性方面发挥着关键作用,有助于防止实现错误、简化模型更新,并确保复杂网络能够被可靠地扩展与复用。然而,由于缺乏可用于系统评估的公开多样化神经网络数据集,评估这些工具的有效性仍具挑战性。为填补这一空白,本研究利用大语言模型自动生成可作为验证基准的神经网络数据集。该数据集设计涵盖多样化的架构组件,并能处理多种输入数据类型与任务。总计生成608个样本,每个样本均符合一组精确的设计规范。为进一步确保其一致性,我们通过静态分析与符号追踪对生成网络的正确性进行了验证。本数据集已公开提供,以支持学界推进神经网络可靠性与适应性的相关研究。