Factor analysis is a way to characterize the relationships between many (observable) variables in terms of a smaller number of unobservable random variables which are called factors. However, the application of factor models and its success can be subjective or difficult to gauge, since infinitely many factor models that produce the same correlation matrix can be fit given sample data. Thus, there is a need to operationalize a criterion that measures how meaningful or "interpretable" a factor model is in order to select the best among many factor models. While there are already techniques that aim to measure and enhance interpretability, new indices, as well as rotation methods via mathematical optimization based on them, are proposed to measure interpretability. The proposed methods directly incorporate semantics with the help of natural language processing and are generalized to incorporate any "prior information". Moreover, the indices allow for complete or partial specification of relationships at a pairwise level. Aside from these, two other main benefits of the proposed methods are that they do not require the estimation of factor scores, which avoids the factor score indeterminacy problem, and that no additional explanatory variables are necessary. The implementation of the proposed methods is written in Python 3 and is made available together with several helper functions through the package interpretablefa on the Python Package Index. The methods' application is demonstrated here using data on the Experiences in Close Relationships Scale, obtained from the Open-Source Psychometrics Project.
翻译:因子分析是一种通过较少数量的不可观测随机变量(称为因子)来刻画多个(可观测)变量之间关系的方法。然而,因子模型的应用及其成功与否可能具有主观性或难以衡量,因为给定样本数据可以拟合出无限多个产生相同相关矩阵的因子模型。因此,需要将衡量因子模型意义或“可解释性”的标准操作化,以便从众多因子模型中选择最佳模型。虽然已有技术旨在测量和增强可解释性,但本文提出了新的指标以及基于这些指标的数学优化旋转方法,用以度量可解释性。所提出的方法借助自然语言处理直接融入语义信息,并可推广至纳入任何“先验信息”。此外,这些指标允许在成对水平上完全或部分地指定关系。除此之外,所提方法的另外两个主要优点是:它们不需要估计因子得分(从而避免了因子得分不确定性问题),且无需额外的解释变量。所提方法的实现采用Python 3编写,并通过Python Package Index上的interpretablefa软件包与若干辅助函数一同提供。本文使用从开源心理测量项目获取的亲密关系经历量表数据展示了这些方法的应用。