We introduce a novel exploratory technique, termed biarchetype analysis, which extends archetype analysis to simultaneously identify archetypes of both observations and features. This innovative unsupervised machine learning tool aims to represent observations and features through instances of pure types, or biarchetypes, which are easily interpretable as they embody mixtures of observations and features. Furthermore, the observations and features are expressed as mixtures of the biarchetypes, which makes the structure of the data easier to understand. We propose an algorithm to solve biarchetype analysis. Although clustering is not the primary aim of this technique, biarchetype analysis is demonstrated to offer significant advantages over biclustering methods, particularly in terms of interpretability. This is attributed to biarchetypes being extreme instances, in contrast to the centroids produced by biclustering, which inherently enhances human comprehension. The application of biarchetype analysis across various machine learning challenges underscores its value, and both the source code and examples are readily accessible in R and Python at https://github.com/aleixalcacer/JA-BIAA.
翻译:我们提出了一种新颖的探索性技术,称为双原型分析,它将原型分析扩展至同时识别观测值与特征的原型。这一创新的无监督机器学习工具旨在通过纯类型(即双原型)的实例来表示观测值与特征,这些实例易于解释,因为它们体现了观测值与特征的混合。此外,观测值与特征被表达为双原型的混合,这使得数据结构更易于理解。我们提出了一种求解双原型分析的算法。尽管聚类并非该技术的主要目标,但双原型分析被证明在可解释性方面相较于双聚类方法具有显著优势。这归因于双原型是极端实例,与双聚类产生的质心形成对比,从而本质上增强了人类的理解。双原型分析在各种机器学习问题中的应用突显了其价值,源代码和示例可在 https://github.com/aleixalcacer/JA-BIAA 的 R 和 Python 版本中便捷获取。