Inverting Self-Organizing Maps: A Unified Activation-Based Framework

Self-Organizing Maps provide topology-preserving projections of high-dimensional data and have been widely used for visualization, clustering, and vector quantization. In this work, we show that the activation pattern of a SOM - the squared distances to its prototypes - can be inverted to recover the exact input under mild geometric conditions. This follows from a classical fact in Euclidean distance geometry: a point in $D$ dimensions is uniquely determined by its distances to $D{+}1$ affinely independent references. We derive the corresponding linear system and characterize the conditions under which the inversion is well-posed. Building upon this mechanism, we introduce the Manifold-Aware Unified SOM Inversion and Control (MUSIC) update rule, which enables controlled, semantically meaningful trajectories in latent space. MUSIC modifies squared distances to selected prototypes while preserving others, resulting in a deterministic geometric flow aligned with the SOM's piecewise-linear structure. Tikhonov regularization stabilizes the update rule and ensures smooth motion on high-dimensional datasets. Unlike variational or probabilistic generative models, MUSIC does not rely on sampling, latent priors, or encoder-decoder architectures. If no perturbation is applied, inversion recovers the exact input; when a target cluster or prototype is specified, MUSIC produces coherent semantic variations while remaining on the data manifold. This leads to a new perspective on data augmentation and controllable latent exploration based solely on prototype geometry. We validate the approach using synthetic Gaussian mixtures, the MNIST and the Faces in the Wild dataset. Across all settings, MUSIC produces smooth, interpretable trajectories that reveal the underlying geometry of the learned manifold, illustrating the advantages of SOM-based inversion over unsupervised clustering.

翻译：自组织映射能够提供高维数据的拓扑保持投影，并已广泛应用于可视化、聚类和向量量化。本研究表明，在温和的几何条件下，自组织映射的激活模式（即到其原型点的平方距离）可通过逆运算精确恢复原始输入。这源于欧氏距离几何中的一个经典事实：$D$维空间中的点可由其到$D{+}1个仿射无关参考点的距离唯一确定。我们推导了相应的线性系统，并刻画了逆映射适定性的条件。基于此机制，我们提出了流形感知统一自组织映射逆映射与控制更新规则，该规则能够在潜在空间中实现受控的、语义有意义的轨迹。该规则通过修改到选定原型点的平方距离，同时保持其他距离不变，从而产生与自组织映射分段线性结构对齐的确定性几何流。吉洪诺夫正则化使更新规则稳定，并确保在高维数据集上的平滑运动。与变分或概率生成模型不同，该规则不依赖于采样、潜在先验或编码器-解码器架构。若不施加扰动，逆映射可精确恢复输入；当指定目标聚类或原型点时，该规则能产生连贯的语义变化，同时保持在数据流形上。这为基于原型几何的数据增强和可控潜在探索提供了新视角。我们通过合成高斯混合模型、MNIST和野外人脸数据集验证了该方法。在所有实验设置中，该规则均能生成平滑、可解释的轨迹，揭示学习流形的底层几何结构，展现了基于自组织映射的逆映射相较于无监督聚类的优势。