The MIT/IEEE/Amazon Graph Challenge provides a venue for individuals and teams to showcase new innovations in large-scale graph and sparse data analysis. The Anonymized Network Sensing Graph Challenge processes over 100 billion network packets to construct privacy-preserving traffic matrices, with a GraphBLAS reference implementation demonstrating how hypersparse matrices can be applied to this problem. This work presents a refactoring and benchmarking of a section of the reference code to improve clarity, adaptability, and performance. The original Python implementation spanning approximately 1000 lines across 3 files has been streamlined to 325 lines across two focused modules, achieving a 67% reduction in code size while maintaining full functionality. Using pMatlab and pPython distributed array programming libraries, the addition of parallel maps allowed for parallel benchmarking of the data. Scalable performance is demonstrated for large-scale summation and analysis of traffic matrices. The resulting implementation increases the potential impact of the Graph Challenge by providing a clear and efficient foundation for participants.
翻译:MIT/IEEE/Amazon Graph Challenge为个人和团队展示大规模图与稀疏数据分析的创新成果提供了平台。匿名网络感知Graph Challenge通过处理超过1000亿个网络数据包来构建隐私保护的流量矩阵,其GraphBLAS参考实现展示了超稀疏矩阵在此类问题中的应用潜力。本研究对部分参考代码进行了重构与基准测试,以提升代码清晰度、适应性和性能。原Python实现分布在3个文件中约1000行代码,现精简为两个专注模块共325行,在保持完整功能的同时实现了67%的代码量缩减。通过采用pMatlab和pPython分布式数组编程库,并引入并行映射机制,实现了数据的并行基准测试。研究展示了大规模流量矩阵求和与分析的可扩展性能。该优化实现通过为参赛者提供清晰高效的代码基础,显著提升了Graph Challenge的潜在影响力。