Principal component analysis (PCA) is perhaps the most widely used method for data dimensionality reduction. A key question in PCA is deciding how many factors to retain. This manuscript describes a new approach to automatically selecting the number of principal components based on the Bayesian minimum message length method of inductive inference. We derive a new estimate of the isotropic residual variance and demonstrate that it improves on the usual maximum likelihood approach. We also discuss extending this approach to finite mixture models of principal component analyzers.
翻译:主成分分析(PCA)可能是最广泛使用的数据降维方法。PCA中的一个关键问题是决定保留多少因子。本文描述了一种基于贝叶斯最小消息长度归纳推理方法自动选择主成分数量的新途径。我们推导了各向同性残差方差的新估计量,并证明其优于通常的最大似然方法。我们还讨论了将该方法扩展至主成分分析器的有限混合模型。