We introduce an algorithm, Cayley transform ellipsoid fitting (CTEF), that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific -- meaning it always returns elliptic solutions -- and can fit arbitrary ellipsoids. It also outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering. Since CTEF captures global curvature, it is able to extract nonlinear features in data that other methods fail to identify. This is illustrated in the context of dimension reduction on human cell cycle data, and in the context of clustering on classical toy examples. In the latter case, CTEF outperforms 10 popular clustering algorithms.
翻译:我们提出一种算法——Cayley变换椭球拟合(CTEF),利用Cayley变换对任意维度的含噪数据进行椭球拟合。与众多椭球拟合方法不同,CTEF具有椭球特异性(即始终返回椭球解)且能拟合任意椭球。当数据在椭球表面非均匀分布时,该方法亦优于其他拟合算法。受机器学习领域对可解释性与可复现方法的需求启发,我们将CTEF应用于降维、数据可视化与聚类分析。由于CTEF能捕获全局曲率特征,其可提取其他方法难以识别的数据非线性特征。这一特性通过人类细胞周期数据的降维实验及经典玩具数据集的聚类实验得到验证。在后一场景中,CTEF的性能优于10种主流聚类算法。