We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.
翻译:我们提出了Cayley变换椭球拟合算法(CTEF),该算法利用Cayley变换在任意维度下对含噪声数据进行椭球拟合。与众多椭球拟合方法不同,CTEF具有椭球特异性,即始终返回椭球解,且能拟合任意椭球。当数据在椭球表面分布不均匀时,其拟合性能显著优于其他方法。受机器学习领域对可解释性与可复现方法日益增长的需求启发,我们将CTEF应用于细胞周期与昼夜节律数据以及多个经典玩具示例的降维、数据可视化和聚类分析。由于CTEF能捕捉全局曲率,它可提取其他机器学习方法难以识别的数据非线性特征。例如,在聚类示例中,CTEF的性能优于10种主流算法。