This paper introduces BWSNet, a model that can be trained from raw human judgements obtained through a Best-Worst scaling (BWS) experiment. It maps sound samples into an embedded space that represents the perception of a studied attribute. To this end, we propose a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task. We tested our proposal on data from two BWS studies investigating the perception of speech social attitudes and timbral qualities. For both datasets, our results show that the structure of the latent space is faithful to human judgements.
翻译:本文介绍BWSNet,一种可通过最优-最差标度(Best-Worst scaling,BWS)实验获取的原始人类判断进行训练的模型。该模型将声音样本映射至表征所研究属性感知的嵌入空间。为此,我们提出一组代价函数与约束条件,将试验层面的序数关系解释为度量学习任务中的距离比较。我们在两项分别研究语音社交态度感知与音色品质感知的BWS实验数据上对提出的方法进行了测试。两个数据集的结果均表明,该潜在空间的结构与人类判断高度一致。