As high-performance computing systems scale in size and complexity, efficient resource management is essential to minimize communication overhead. The HyperX is a richly connected, low-diameter network that offers a scalable and cost-effective alternative to traditional topologies. However, resource allocation in HyperX remains underexplored, and strategies designed for networks like Torus, Fat-tree, or Dragonfly do not directly transfer. In this work, we propose and formalize several resource allocation strategies for HyperX networks, categorized into linear, geometric, and stochastic functions. We characterize these strategies theoretically by analyzing their topological properties, including dilation, convexity, and partition bandwidth.Furthermore, we conduct an exhaustive experimental evaluation using synthetic traffic and application communication kernels to assess the impact of these strategies on performance under different routing algorithms. Our results indicate that partition bandwidth and switch locality are decisive factors in mitigating interferences. Notably, the Diagonal allocation strategy, which is not convex, consistently outperforms traditional approaches in most scenarios. Finally, we provide a set of lessons learned to guide the implementation of resource allocation policies in HPC systems based on HyperX networks.
翻译:随着高性能计算系统在规模和复杂性上的持续扩展,高效资源管理对于降低通信开销至关重要。HyperX是一种高连通度、低直径网络,与传统拓扑结构相比,具备可扩展性和成本效益优势。然而,针对HyperX网络的资源分配问题尚未得到充分探索,且为Torus、Fat-tree或Dragonfly等网络设计的策略无法直接迁移应用。本研究提出并形式化了几类适用于HyperX网络的资源分配策略,将其划分为线性、几何和随机函数三类。我们从拓扑属性角度对这些策略进行理论表征,包括扩张度、凸性和分区带宽。进一步地,我们采用合成流量和应用通信内核开展详尽的实验评估,分析不同路由算法下这些策略对性能的影响。结果表明,分区带宽和交换机局部性是缓解干扰的关键因素。值得注意的是,非凸的对角线分配策略在多数场景下持续优于传统方法。最终,我们总结了一系列经验性启示,以指导基于HyperX网络的高性能计算系统中资源分配策略的实施。