Conceptual spaces represent entities in terms of their primitive semantic features. Such representations are highly valuable but they are notoriously difficult to learn, especially when it comes to modelling perceptual and subjective features. Distilling conceptual spaces from Large Language Models (LLMs) has recently emerged as a promising strategy. However, existing work has been limited to probing pre-trained LLMs using relatively simple zero-shot strategies. We focus in particular on the task of ranking entities according to a given conceptual space dimension. Unfortunately, we cannot directly fine-tune LLMs on this task, because ground truth rankings for conceptual space dimensions are rare. We therefore use more readily available features as training data and analyse whether the ranking capabilities of the resulting models transfer to perceptual and subjective features. We find that this is indeed the case, to some extent, but having perceptual and subjective features in the training data seems essential for achieving the best results. We furthermore find that pointwise ranking strategies are competitive against pairwise approaches, in defiance of common wisdom.
翻译:概念空间基于原始语义特征表征实体。此类表征极具价值,但获取难度极大,尤其在建模感知与主观特征时尤为突出。从大语言模型中提炼概念空间最近成为一种颇具前景的策略。然而,现有研究仅限于使用相对简单的零样本策略对预训练大语言模型进行探查。我们特别关注根据给定概念空间维度对实体排序的任务。遗憾的是,由于概念空间维度的真实排序数据极为稀缺,我们无法直接在此任务上微调大语言模型。因此,我们采用更易获取的特征作为训练数据,并分析所得模型的排序能力是否可迁移至感知与主观特征。研究发现,这种迁移在一定程度上的确成立,但若要获得最佳效果,训练数据中包含感知与主观特征似乎至关重要。此外,我们还发现点式排序策略能够与成对方法相抗衡,这与普遍认知相悖。