Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at \url{https://github.com/jefferykclam/ICQF}.
翻译:精神病学研究旨在通过识别少量潜在因子来解释问卷数据所测量的行为精神病理学表现。尽管因子分析是此目的的传统工具,但所得因子可能缺乏可解释性且易受混杂变量影响。此外,数据缺失情况普遍存在,通常需要进行显式插补。为克服这些局限,我们提出可解释性约束问卷分解(ICQF)方法——一种针对问卷数据定制的带有正则化的非负矩阵分解方法。该方法旨在提升因子可解释性与解稳定性。我们提供具有理论收敛保证的优化流程,以及能准确检测潜在维度的自动化程序。使用符合实际的合成数据验证了这些程序的有效性。我们通过两个独立数据集(健康脑网络与青少年脑认知发育研究)中的通用问卷展示了该方法的有效性。具体而言,我们证明ICQF能提升领域专家定义的可解释性,同时保留跨多种障碍的诊断信息,并在小规模数据集中优于竞争方法。这表明我们方法中的正则化与领域特征相匹配。ICQF的Python实现可通过\url{https://github.com/jefferykclam/ICQF}获取。