aweSOM：一种用于机器学习聚类分析的CPU/GPU加速自组织映射与统计集成框架 (aweSOM: a CPU/GPU-accelerated Self-organizing Map and Statistically Combined Ensemble Framework for Machine-learning Clustering Analysis)

We introduce aweSOM, an open-source Python package for machine learning (ML) clustering and classification, using a Self-organizing Maps (SOM) algorithm that incorporates CPU/GPU acceleration to accommodate large ($N > 10^6$, where $N$ is the number of data points), multidimensional datasets. aweSOM consists of two main modules, one that handles the initialization and training of the SOM, and another that stacks the results of multiple SOM realizations to obtain more statistically robust clusters. Existing Python-based SOM implementations (e.g., POPSOM, Yuan (2018); MiniSom, Vettigli (2018); sklearn-som) primarily serve as proof-of-concept demonstrations, optimized for smaller datasets, but lacking scalability for large, multidimensional data. aweSOM provides a solution for this gap in capability, with good performance scaling up to $\sim 10^8$ individual points, and capable of utilizing multiple features per point. We compare the code performance against the legacy implementations it is based on, and find a 10-100x speed up, as well as significantly improved memory efficiency, due to several built-in optimizations.

翻译：本文介绍aweSOM，一个用于机器学习（ML）聚类与分类的开源Python软件包。它采用自组织映射（SOM）算法，并融合CPU/GPU加速技术，以处理大规模（$N > 10^6$，其中$N$为数据点数量）、多维度的数据集。aweSOM包含两个核心模块：一个负责SOM的初始化与训练，另一个则将多次SOM实现的结果进行堆叠，以获得统计意义上更为稳健的聚类簇。现有的基于Python的SOM实现（例如POPSOM, Yuan (2018); MiniSom, Vettigli (2018); sklearn-som）主要作为概念验证演示，针对较小数据集进行了优化，但缺乏处理大型多维数据的可扩展性。aweSOM填补了这一能力空白，其性能可良好扩展至约$10^8$个独立数据点，并且能够利用每个数据点的多个特征。我们将该代码性能与其所基于的旧有实现进行比较，发现由于多项内置优化措施，其速度提升了10-100倍，同时内存效率也得到显著改善。

相关内容

Awesome (软件)

关注 0

Awesome 是运行于UNIX以及Linux、FreeBSD等类Unix操作系统上的窗口管理器,是采用GPL协议的自由软件。不同于KWin和Metacity,awesome是一款Tiling window manager，直译就是“瓦片式窗口管理器”，意译为“平铺式窗口管理器”。所谓的平铺就是之所有的窗口都不会相互重叠，而是自动的被调整大小使得它们能够刚好占满整个屏幕。这和传统的桌面管理器的概念相差很大。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日