Torus Probabilistic Principal Component Analysis

Analyzing data in non-Euclidean spaces, such as bioinformatics, biology, and geology, where variables represent directions or angles, poses unique challenges. This type of data is known as circular data in univariate cases and can be termed spherical or toroidal in multivariate contexts. In this paper, we introduce a novel extension of Probabilistic Principal Component Analysis (PPCA) designed for toroidal (or torus) data, termed Torus Probabilistic PCA (TPPCA). We provide detailed algorithms for implementing TPPCA and demonstrate its applicability to torus data. To assess the efficacy of TPPCA, we perform comparative analyses using a simulation study and three real datasets. Our findings highlight the advantages and limitations of TPPCA in handling torus data. Furthermore, we propose statistical tests based on likelihood ratio statistics to determine the optimal number of components, enhancing the practical utility of TPPCA for real-world applications.

翻译：在非欧几里得空间中分析数据（例如生物信息学、生物学和地质学等领域中变量表示方向或角度的情况）面临独特的挑战。此类数据在单变量情形下称为环形数据，在多变量背景下可称为球形或环面数据。本文提出了一种专为环面数据设计的概率主成分分析（PPCA）新扩展方法，称为环面概率主成分分析（TPPCA）。我们提供了实现TPPCA的详细算法，并论证了其在环面数据上的适用性。为评估TPPCA的有效性，我们通过模拟研究和三个真实数据集进行了比较分析。研究结果凸显了TPPCA在处理环面数据方面的优势与局限性。此外，我们提出了基于似然比统计量的统计检验方法以确定最佳成分数量，从而增强了TPPCA在实际应用中的实用性。

相关内容

PCA

关注 0

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日