Generalized singular values (GSVs) play an essential role in the comparative analysis. In the real world data for comparative analysis, both data matrices are usually numerically low-rank. This paper proposes a randomized algorithm to first approximately extract bases and then calculate GSVs efficiently. The accuracy of both basis extration and comparative analysis quantities, angular distances, generalized fractions of the eigenexpression, and generalized normalized Shannon entropy, are rigursly analyzed. The proposed algorithm is applied to both synthetic data sets and the genome-scale expression data sets. Comparing to other GSVs algorithms, the proposed algorithm achieves the fastest runtime while preserving sufficient accuracy in comparative analysis.
翻译:广义奇异值在比较分析中扮演着重要角色。在用于比较分析的真实世界数据中,两个数据矩阵通常具有数值低秩特性。本文提出一种随机算法,先近似提取基,再高效计算广义奇异值。对基提取的精度以及比较分析量(角距离、特征表达的广义分数和广义归一化香农熵)的精度进行了严格分析。所提算法应用于合成数据集和基因组规模表达数据集。与其他广义奇异值算法相比,所提算法在保持比较分析足够精度的同时,实现了最快的运行时间。