Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification

Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data availability and cross-cultural comparability. Recent advances in natural language processing suggest that the semantic structure of questionnaire items may encode latent construct organization, offering a complementary response-free perspective. We introduce a topic-modeling framework that operationalizes semantic latent structure for scale simplification. Items are encoded using contextual sentence embeddings and grouped via density-based clustering to discover latent semantic factors without predefining their number. Class-based term weighting derives interpretable topic representations that approximate constructs and enable merging of semantically adjacent clusters. Representative items are selected using membership criteria within an integrated reduction pipeline. We benchmarked the framework across DASS, IPIP, and EPOCH, evaluating structural recovery, internal consistency, factor congruence, correlation preservation, and reduction efficiency. The proposed method recovered coherent factor-like groupings aligned with established constructs. Selected items reduced scale length by 60.5% on average while maintaining psychometric adequacy. Simplified scales showed high concordance with original factor structures and preserved inter-factor correlations, indicating that semantic latent organization provides a response-free approximation of measurement structure. Our framework formalizes semantic structure as an inspectable front-end for scale construction and reduction. To facilitate adoption, we provide a visualization-supported tool enabling one-click semantic analysis and structured simplification.

翻译：心理量表优化传统上依赖于基于响应的方法，如因子分析、项目反应理论和网络心理测量学，以优化项目构成。尽管严谨，这些方法需要大样本，且可能受数据可用性和跨文化可比性限制。自然语言处理的最新进展表明，问卷项目的语义结构可能编码潜在构念组织，为量表优化提供了无需响应的补充视角。我们引入一种主题建模框架，将语义潜在结构操作化用于量表简化。项目通过上下文句子嵌入进行编码，并通过基于密度的聚类进行分组，从而无需预先定义数量即可发现潜在语义因子。基于类别的术语加权生成可解释的主题表征，近似于构念，并支持合并语义相邻的聚类。代表性项目通过集成简化流程中的成员资格标准进行筛选。我们在DASS、IPIP和EPOCH量表上对该框架进行了基准测试，评估了结构恢复性、内部一致性、因子一致性、相关性保持和简化效率。所提方法恢复了与既有构念一致、具有内在一致性的类因子分组。所选项目平均将量表长度减少60.5%，同时保持心理测量学充分性。简化后的量表显示出与原始因子结构的高度一致性，并保持了因子间相关性，表明语义潜在组织提供了测量结构的无需响应近似。本框架将语义结构形式化为可检验的量表构建与简化前端工具。为促进应用，我们提供了可视化支持工具，支持一键式语义分析和结构化简化。