Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.

翻译：语义表征可被构建为一个结构化、动态的知识空间，人类通过在该空间中的导航来检索和操纵意义。为研究人类如何遍历这一几何结构，我们提出了一个将概念生成表征为在嵌入空间中导航的框架。利用不同的Transformer文本嵌入模型，我们基于累积嵌入构建了参与者特定的语义轨迹，并提取了几何与动力学度量，包括至下一个点的距离、至质心的距离、熵、速度和加速度。这些度量捕捉了语义导航的标量与方向性特征，为将语义表征搜索视为几何空间中的运动提供了计算基础视角。我们在涵盖不同语言和不同属性生成任务的四个数据集上评估了该框架：神经退行性疾病数据集、言语流畅性（脏话）数据集、意大利语属性列举任务数据集以及德语属性列举任务数据集。在这些不同情境中，我们的方法能够区分临床组别和概念类型，提供了一个相较于通常需要大量人工干预的语言学预处理方法而言，所需人工干预最少的数学框架。与非累积方法的比较表明，累积嵌入在较长轨迹上表现最佳，而较短轨迹可能因上下文信息过少而更适合非累积方法。关键的是，不同的嵌入模型产生了相似的结果，突显了尽管训练流程不同，但不同学习到的表征之间具有相似性。通过将语义导航框架化为嵌入空间中的结构化轨迹，该研究将认知建模与学习到的表征联系起来，从而建立了一个量化语义表征动力学的流程，可应用于临床研究、跨语言分析以及人工认知评估。