Genetic Programming for Explainable Manifold Learning

Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.

翻译：流形学习技术在机器学习中发挥着关键作用，通过揭示高维数据中的低维嵌入，将数据转换为低维表示，从而提升数据分析的效率和可解释性。然而，当前流形学习方法面临一个显著挑战：缺乏明确的函数映射，而这在许多实际应用中对可解释性至关重要。遗传编程以其可解释的基于函数的树形模型而闻名，已成为应对这一挑战的有效方法。以往的研究利用多目标遗传编程在流形质量与嵌入维度之间取得平衡，从而生成一系列嵌入维度下的函数映射。然而，这些映射树往往结构复杂，阻碍了可解释性。为此，本文提出一种新颖方法——基于遗传编程的可解释流形学习（GP-EMaL），该方法直接对树形复杂度施加惩罚。我们的新方法能够在保持高流形质量的同时显著增强可解释性，并允许定制复杂度度量，例如对称平衡、缩放和节点复杂度，以满足多样化的应用需求。实验分析表明，GP-EMaL在大多数情况下能够达到与现有方法相当的性能，同时使用更简单、更小且更具可解释性的树结构。这一进展标志着向实现可解释流形学习迈出了重要一步。

相关内容

流形学习

关注 345

流形学习，全称流形学习方法(Manifold Learning)，自2000年在著名的科学杂志《Science》被首次提出以来，已成为信息科学领域的研究热点。在理论和应用上，流形学习方法都具有重要的研究意义。假设数据是均匀采样于一个高维欧氏空间中的低维流形，流形学习就是从高维采样数据中恢复低维流形结构，即找到高维空间中的低维流形，并求出相应的嵌入映射，以实现维数约简或者数据可视化。它是从观测到的现象中去寻找事物的本质，找到产生数据的内在规律。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日