Principal component analysis (PCA) is perhaps the most widely method for data dimensionality reduction. A key question in PCA decomposition of data is deciding how many factors to retain. This manuscript describes a new approach to automatically selecting the number of principal components based on the Bayesian minimum message length method of inductive inference. We also derive a new estimate of the isotropic residual variance and demonstrate, via numerical experiments, that it improves on the usual maximum likelihood approach.
翻译:主成分分析(PCA)或许是数据降维中最广泛使用的方法。在PCA分解数据中,一个关键问题是如何确定保留多少个因子。本文描述了一种基于贝叶斯最小消息长度归纳推理方法的新途径,用于自动选择主成分的数量。我们还推导出各向同性残差方差的一个新估计量,并通过数值实验证明,该估计量优于通常的最大似然方法。