Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.
翻译:流形学习技术在机器学习中扮演着关键角色,它通过揭示高维数据中的低维嵌入,将数据转化为低维表示,从而提升数据分析的效率和可解释性。然而,当前流形学习方法面临的一个显著挑战是缺乏显式的函数映射,而这在许多实际应用的可解释性方面至关重要。遗传编程以其基于可解释函数树模型的特点,已成为应对这一挑战的有前景方法。先前研究利用多目标遗传编程来平衡流形质量与嵌入维度,生成了多种嵌入尺寸下的函数映射。然而,这些映射树往往变得复杂,阻碍了可解释性。为此,本文提出了一种新颖方法——基于遗传编程的可解释流形学习(GP-EMaL),该方法直接惩罚树的复杂度。我们的新方法能够在保持高流形质量的同时,显著增强可解释性,并且允许定制复杂度度量(如对称性平衡、缩放和节点复杂度),以满足多样化的应用需求。实验分析表明,GP-EMaL在大多数情况下能够匹配现有方法的性能,同时使用更简单、更小且更易解释的树结构。这一进展标志着在实现可解释流形学习方面迈出了重要一步。