GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use because workloads no longer fit cleanly within device-memory limits. We introduce FloatSOM, a SOM framework for scalable training and deployment that supports multi-GPU execution, out-of-memory disk-backed streaming, and novel topologies beyond regular lattices. We evaluate FloatSOM on 14 synthetic and real benchmark datasets together with controlled speed scaling benchmarks, and show that these improved topologies, combined with topology-aware hyperparameter fine-tuning, yield lower quantization error than current state-of-the-art SOM baselines. FloatSOM also sustains this performance at large scale with high-throughput distributed execution; in the largest benchmark, it trains a 1024-node SOM network on 1,000,000,000 samples with 50 features in 6.16 minutes on 8 GPUs across two separate high-performance-computing nodes.
翻译:GPU加速的自组织映射(SOM)实现是大规模SOM分析中最具竞争力的方案之一,但数据集规模的增长日益挑战其实际应用,因为工作负载不再能完全适配设备内存限制。我们提出了FloatSOM,一个用于可扩展训练和部署的SOM框架,支持多GPU执行、内存不足时的磁盘流式处理,以及超越常规格点的新型拓扑结构。我们在14个合成与实际基准数据集上评估了FloatSOM,并结合受控速度缩放基准测试,结果表明,这些改进的拓扑结构结合拓扑感知的超参数微调,相较于当前最先进的SOM基线,能获得更低的量化误差。FloatSOM还能在大规模场景下通过高吞吐量分布式执行维持这种性能;在最大规模的基准测试中,它在两个独立的高性能计算节点上的8个GPU上,以6.16分钟的时间训练了一个包含1,000,000,000个样本、50个特征的1024节点SOM网络。