This paper introduces BWSNet, a model that can be trained from raw human judgements obtained through a Best-Worst scaling (BWS) experiment. It maps sound samples into an embedded space that represents the perception of a studied attribute. To this end, we propose a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task. We tested our proposal on data from two BWS studies investigating the perception of speech social attitudes and timbral qualities. For both datasets, our results show that the structure of the latent space is faithful to human judgements.
翻译:本文介绍了BWSNet,一种可从最佳-最差缩放(BWS)实验所获原始人类判断进行训练的模型。该模型将声音样本映射至表征所研究属性感知的嵌入空间。为此,我们提出一组代价函数与约束条件,将试验级别的序数关系解释为度量学习任务中的距离比较。我们基于两项分别研究语音社交态度与音色感知的BWS实验数据对提案进行了测试。对于两个数据集,我们的结果表明潜在空间结构忠实于人类判断。