NetSmith: An Optimization Framework for Machine-Discovered Network Topologies

Over the past few decades, network topology design for general purpose, shared memory multicores has been primarily driven by human experts who use their insights to arrive at network designs that balance the competing goals of performance requirements (e.g., latency, bandwidth) and cost constraints (e.g., router radix, router counts). On the other hand, there have been automatic NoC synthesis methods for SoCs to optimize for application-specific communication and objectives such as resource usage or power. Unfortunately, these techniques do not lend themselves to the general-purpose context, where directly applying these previous NoC synthesis techniques in the general-purpose context yields poor results, even worse than expert-designed networks. We design and develop an automatic network design methodology - NetSmith - to design networks for general-purpose, shared memory multicores that comprehensively outperform expert-designed networks. We employ NetSmith in the context of interposer networks for chiplet-based systems where there has been significant recent work on network topology design (e.g., Kite, Butter Donut, Double Butterfly). NetSmith generated topologies are capable of achieving significantly higher throughput (50% to 75% higher) while also reducing average hop count by 8% to 13.5%) than previous expert-designed and synthesized networks. Full system simulations using PARSEC benchmarks demonstrate that the improved network performance translates to improved application performance with up to 11% mean speedup over previous NoI topologies.

翻译：过去几十年，通用共享内存多核系统的网络拓扑设计主要由人类专家主导，他们利用自身洞察力设计出能够在性能需求（如延迟、带宽）与成本约束（如路由器端口数、路由器数量）之间取得平衡的网络架构。另一方面，面向特定应用通信的片上网络（NoC）综合方法已被用于优化SoC的资源利用率或功耗等目标。然而，这些技术难以适配通用场景——直接将其应用于通用多核环境时，产生的网络性能甚至劣于专家设计的网络。本文设计并开发了一种自动网络设计方法——NetSmith——专为通用共享内存多核系统设计综合性能优于专家网络的拓扑结构。我们以基于芯粒的系统中使用的中介层网络为背景应用NetSmith，该领域近期涌现了大量网络拓扑设计工作（例如Kite、Butter Donut、Double Butterfly）。与先前专家设计及综合生成的网络相比，NetSmith生成的拓扑可在将平均跳数降低8%~13.5%的同时，实现显著更高的吞吐量（提升50%~75%）。基于PARSEC基准测试的全系统仿真表明，这种网络性能提升可转化为应用性能改进，相较于先前NoI拓扑实现了最高11%的平均加速比。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日