Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.
翻译:流形学习技术在机器学习中发挥着关键作用,通过揭示高维数据中的低维嵌入,将数据转换为低维表示,从而提升数据分析的效率和可解释性。然而,当前流形学习方法面临一个显著挑战:缺乏明确的函数映射,而这在许多实际应用中对可解释性至关重要。遗传编程以其可解释的基于函数的树形模型而闻名,已成为应对这一挑战的有效方法。以往的研究利用多目标遗传编程在流形质量与嵌入维度之间取得平衡,从而生成一系列嵌入维度下的函数映射。然而,这些映射树往往结构复杂,阻碍了可解释性。为此,本文提出一种新颖方法——基于遗传编程的可解释流形学习(GP-EMaL),该方法直接对树形复杂度施加惩罚。我们的新方法能够在保持高流形质量的同时显著增强可解释性,并允许定制复杂度度量,例如对称平衡、缩放和节点复杂度,以满足多样化的应用需求。实验分析表明,GP-EMaL在大多数情况下能够达到与现有方法相当的性能,同时使用更简单、更小且更具可解释性的树结构。这一进展标志着向实现可解释流形学习迈出了重要一步。